1
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
2
|
Zong N, Li N, Wen A, Ngo V, Yu Y, Huang M, Chowdhury S, Jiang C, Fu S, Weinshilboum R, Jiang G, Hunter L, Liu H. BETA: a comprehensive benchmark for computational drug-target prediction. Brief Bioinform 2022; 23:6596989. [PMID: 35649342 PMCID: PMC9294420 DOI: 10.1093/bib/bbac199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/10/2022] [Accepted: 04/29/2022] [Indexed: 11/14/2022] Open
Abstract
Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Ning Li
- Center for Structure Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Victoria Ngo
- Betty Irene Moore School of Nursing, University of California Davis Health, Sacramento, CA.,Stanford Health Policy, Stanford School of Medicine and Freeman Spogli Institute for International Studies, Palo Alto, CA
| | - Yue Yu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Ming Huang
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Shaika Chowdhury
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Chao Jiang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Lawrence Hunter
- Department of Pharmacology, University of Colorado Denver, Aurora, CO
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| |
Collapse
|
3
|
González JM. Visualizing the superfamily of metallo-β-lactamases through sequence similarity network neighborhood connectivity analysis. Heliyon 2021; 7:e05867. [PMID: 33426353 PMCID: PMC7785958 DOI: 10.1016/j.heliyon.2020.e05867] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 11/19/2020] [Accepted: 12/23/2020] [Indexed: 12/13/2022] Open
Abstract
Protein sequence similarity networks (SSNs) constitute a convenient approach to analyze large polypeptide sequence datasets, and have been successfully applied to study a number of protein families over the past decade. SSN analysis is herein combined with traditional cladistic and phenetic phylogenetic analysis (respectively based on multiple sequence alignments and all-against-all three-dimensional protein structure comparisons) in order to assist the ancestral reconstruction and integrative revision of the superfamily of metallo-β-lactamases (MBLs). It is shown that only 198 out of 15,292 representative nodes contain at least one experimentally obtained protein structure in the Protein Data Bank or a manually annotated SwissProt entry, that is to say, only 1.3 % of the superfamily has been functionally and/or structurally characterized. Besides, neighborhood connectivity coloring, which measures local network interconnectivity, is introduced for detection of protein families within SSN clusters. This approach provides a clear picture of how many families remain unexplored in the superfamily, while most MBL research is heavily biased towards a few families. Further research is suggested in order to determine the SSN topological properties, which will be instrumental for the improvement of automated sequence annotation methods.
Collapse
|
4
|
Rosen MR, Leuthaeuser JB, Parish CA, Fetrow JS. Isofunctional Clustering and Conformational Analysis of the Arsenate Reductase Superfamily Reveals Nine Distinct Clusters. Biochemistry 2020; 59:4262-4284. [PMID: 33135415 DOI: 10.1021/acs.biochem.0c00651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Arsenate reductase (ArsC) is a superfamily of enzymes that reduce arsenate. Due to active site similarities, some ArsC can function as low-molecular weight protein tyrosine phosphatases (LMW-PTPs). Broad superfamily classifications align with redox partners (Trx- or Grx-linked). To understand this superfamily's mechanistic diversity, the ArsC superfamily is classified on the basis of active site features utilizing the tools TuLIP (two-level iterative clustering process) and autoMISST (automated multilevel iterative sequence searching technique). This approach identified nine functionally relevant (perhaps isofunctional) protein groups. Five groups exhibit distinct ArsC mechanisms. Three are Grx-linked: group 4AA (classical ArsC), group 3AAA (YffB-like), and group 5BAA. Two are Trx-linked: groups 6AAAAA and 7AAAAAAAA. One is an Spx-like transcriptional regulatory group, group 5AAA. Three are potential LMW-PTP groups: groups 7BAAAA, and 7AAAABAA, which have not been previously identified, and the well-studied LMW-PTP family group 8AAA. Molecular dynamics simulations were utilized to explore functional site details. In several families, we confirm and add detail to literature-based mechanistic information. Mechanistic roles are hypothesized for conserved active site residues in several families. In three families, simulations of the unliganded structure sample specific conformational ensembles, which are proposed to represent either a more ligand-binding-competent conformation or a pathway toward a more binding-competent state; these active sites may be designed to traverse high-energy barriers to the lower-energy conformations necessary to more readily bind ligands. This more detailed biochemical understanding of ArsC and ArsC-like PTP mechanisms opens possibilities for further understanding of arsenate bioremediation and the LMW-PTP mechanism.
Collapse
Affiliation(s)
- Mikaela R Rosen
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Janelle B Leuthaeuser
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Carol A Parish
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Jacquelyn S Fetrow
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| |
Collapse
|
5
|
Mazmanian K, Sargsyan K, Lim C. How the Local Environment of Functional Sites Regulates Protein Function. J Am Chem Soc 2020; 142:9861-9871. [PMID: 32407086 DOI: 10.1021/jacs.0c02430] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Proteins form complex biological machineries whose functions in the cell are highly regulated at both the cellular and molecular levels. Cellular regulation of protein functions involves differential gene expressions, post-translation modifications, and signaling cascades. Molecular regulation, on the other hand, involves tuning an optimal local protein environment for the functional site. Precisely how a protein achieves such an optimal environment around a given functional site is not well understood. Herein, by surveying the literature, we first summarize the various reported strategies used by certain proteins to ensure their correct functioning. We then formulate three key physicochemical factors for regulating a protein's functional site, namely, (i) its immediate interactions, (ii) its solvent accessibility, and (iii) its conformational flexibility. We illustrate how these factors are applied to regulate the functions of free/metal-bound Cys and Zn sites in proteins.
Collapse
Affiliation(s)
- Karine Mazmanian
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Karen Sargsyan
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Carmay Lim
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan.,Department of Chemistry, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
6
|
Affiliation(s)
- Jacquelyn S. Fetrow
- Office of the President, Albright College, Reading, Pennsylvania, United States of America
- * E-mail:
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
7
|
Holliday GL, Brown SD, Akiva E, Mischel D, Hicks MA, Morris JH, Huang CC, Meng EC, Pegg SCH, Ferrin TE, Babbitt PC. Biocuration in the structure-function linkage database: the anatomy of a superfamily. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3074783. [PMID: 28365730 PMCID: PMC5467563 DOI: 10.1093/database/bax006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/23/2017] [Indexed: 12/11/2022]
Abstract
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL:http://sfld.rbvi.ucsf.edu/
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Michael A Hicks
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Human Longevity, Inc, San Diego, CA 92121, USA
| | - John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Conrad C Huang
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Elaine C Meng
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | | | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
8
|
Knutson ST, Westwood BM, Leuthaeuser JB, Turner BE, Nguyendac D, Shea G, Kumar K, Hayden JD, Harper AF, Brown SD, Morris JH, Ferrin TE, Babbitt PC, Fetrow JS. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences. Protein Sci 2017; 26:677-699. [PMID: 28054422 PMCID: PMC5368075 DOI: 10.1002/pro.3112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 12/22/2016] [Indexed: 01/11/2023]
Abstract
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Collapse
Affiliation(s)
- Stacy T. Knutson
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
- Department of Computer ScienceWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Brian M. Westwood
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
- Department of Computer ScienceWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Janelle B. Leuthaeuser
- Molecular Genetics and Genomics ProgramWake Forest School of MedicineWinston‐SalemNorth Carolina27157
| | - Brandon E. Turner
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Don Nguyendac
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Gabrielle Shea
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Kiran Kumar
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Julia D. Hayden
- Biochemistry Program, Dickinson CollegeCarlislePennsylvania17013
| | - Angela F. Harper
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Shoshana D. Brown
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - John H. Morris
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - Thomas E. Ferrin
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - Patricia C. Babbitt
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | | |
Collapse
|
9
|
Harper AF, Leuthaeuser JB, Babbitt PC, Morris JH, Ferrin TE, Poole LB, Fetrow JS. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins. PLoS Comput Biol 2017; 13:e1005284. [PMID: 28187133 PMCID: PMC5302317 DOI: 10.1371/journal.pcbi.1005284] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 12/06/2016] [Indexed: 12/15/2022] Open
Abstract
Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
Collapse
Affiliation(s)
- Angela F. Harper
- Department of Physics, Wake Forest University, Winston-Salem, North Carolina, United States of America
| | - Janelle B. Leuthaeuser
- Department of Molecular Genetics and Genomics, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco School of Pharmacy, San Francisco, California, United States of America
| | - John H. Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco School of Pharmacy, San Francisco, California, United States of America
| | - Thomas E. Ferrin
- Department of Pharmaceutical Chemistry, University of California San Francisco School of Pharmacy, San Francisco, California, United States of America
| | - Leslie B. Poole
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Jacquelyn S. Fetrow
- Department of Chemistry, University of Richmond, Richmond, Virginia, United States of America
| |
Collapse
|
10
|
Leuthaeuser JB, Morris JH, Harper AF, Ferrin TE, Babbitt PC, Fetrow JS. DASP3: identification of protein sequences belonging to functionally relevant groups. BMC Bioinformatics 2016; 17:458. [PMID: 27835946 PMCID: PMC5106842 DOI: 10.1186/s12859-016-1295-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 10/20/2016] [Indexed: 01/26/2023] Open
Abstract
Background Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can’t keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP. Results The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2. Conclusions DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1295-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Janelle B Leuthaeuser
- Molecular Genetics and Genomics Program, Wake Forest University, Winston-Salem, NC, 27106, USA. .,Present address: University of Richmond, Gottwald Hall C302, Richmond, VA, 23173, USA.
| | - John H Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Angela F Harper
- Department of Physics, Wake Forest University, Winston-Salem, NC, 27106, USA
| | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Jacquelyn S Fetrow
- Department of Chemistry, University of Richmond, Richmond, VA, 23173, USA
| |
Collapse
|
11
|
Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 128:85-99. [PMID: 27697476 DOI: 10.1016/j.pbiomolbio.2016.09.009] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/05/2016] [Accepted: 09/26/2016] [Indexed: 10/20/2022]
Abstract
Study of the hierarchy of domain structure with alternative sets of domains and analysis of discontinuous domains, consisting of remote segments of the polypeptide chain, raised a question about the minimal structural unit of the protein domain. The hypothesis on the decisive role of the polypeptide backbone in determining the elementary units of globular proteins have led to the discovery of closed loops. It is reviewed here how closed loops form the loop-n-lock structure of proteins, providing the foundation for stability and designability of protein folds/domain and underlying their co-translational folding. Simplified protein sequences are considered here with the aim to explore the basic principles that presumably dominated the folding and stability of proteins in the early stages of structural evolution. Elementary functional loops (EFLs), closed loops with one or few catalytic residues, are, in turn, units of the protein function. They are apparent descendants of the prebiotic ring-like peptides, which gave rise to the first functional folds/domains being fused in the beginning of the evolution of protein structure. It is also shown how evolutionary relations between protein functional superfamilies and folds delineated with the help of EFLs can contribute to establishing the rules for design of desired enzymatic functions. Generalized descriptors of the elementary functions are proposed to be used as basic units in the future computational design.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Zejun Zheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| |
Collapse
|
12
|
Comparing atom-based with residue-based descriptors in predicting binding site similarity: do backbone atoms matter? Future Med Chem 2016; 8:1871-1885. [PMID: 27629811 DOI: 10.4155/fmc-2016-0077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
AIM We question the level of detail required in protein 3D-representation to detect site similarity which is relevant for polypharmacology prediction. RESULTS We modified the in-house program SiteAlign to replace generic pharmacophoric descriptors of cavity-lining amino acids by descriptors accounting for solvent exposure. Benchmarking the novel, atom-based, method (SiteAlign2) revealed no global improvement of performance. However, in the rare cases of no sequence or global structure similarities between the compared proteins, SiteAlign2 was more successful if backbone atoms are key determinants of ligand binding. CONCLUSION SiteAlign suits the comparison of binding sites for close or distant homologs. SiteAlign2 provides a better insight into the physical model of site similarity between nonhomologs, but at the expense of an increased sensitivity to atomic coordinates.
Collapse
|
13
|
Poole LB, Nelson KJ. Distribution and Features of the Six Classes of Peroxiredoxins. Mol Cells 2016; 39:53-9. [PMID: 26810075 PMCID: PMC4749874 DOI: 10.14348/molcells.2016.2330] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2015] [Accepted: 12/09/2015] [Indexed: 12/03/2022] Open
Abstract
Peroxiredoxins are cysteine-dependent peroxide reductases that group into 6 different, structurally discernable classes. In 2011, our research team reported the application of a bioinformatic approach called active site profiling to extract active site-proximal sequence segments from the 29 distinct, structurally-characterized peroxiredoxins available at the time. These extracted sequences were then used to create unique profiles for the six groups which were subsequently used to search GenBank(nr), allowing identification of ∼3500 peroxiredoxin sequences and their respective subgroups. Summarized in this minireview are the features and phylogenetic distributions of each of these peroxiredoxin subgroups; an example is also provided illustrating the use of the web accessible, searchable database known as PREX to identify subfamily-specific peroxiredoxin sequences for the organism Vitis vinifera (grape).
Collapse
Affiliation(s)
- Leslie B. Poole
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157,
USA
| | - Kimberly J. Nelson
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157,
USA
- Department of Chemistry, Wake Forest University, Winston-Salem, NC 27109,
USA
| |
Collapse
|