Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Leuthaeuser JB, Knutson ST, Kumar K, Babbitt PC, Fetrow JS. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity. Protein Sci 2015;24:1423-39. [PMID: 26073648 PMCID: PMC4570537 DOI: 10.1002/pro.2724] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 01/27/2023]

For:	Leuthaeuser JB, Knutson ST, Kumar K, Babbitt PC, Fetrow JS. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity. Protein Sci 2015;24:1423-39. [PMID: 26073648 PMCID: PMC4570537 DOI: 10.1002/pro.2724] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 01/27/2023]

Number

Cited by Other Article(s)

Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022;50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]

Abstract

The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.

Collapse

Zong N, Li N, Wen A, Ngo V, Yu Y, Huang M, Chowdhury S, Jiang C, Fu S, Weinshilboum R, Jiang G, Hunter L, Liu H. BETA: a comprehensive benchmark for computational drug-target prediction. Brief Bioinform 2022;23:6596989. [PMID: 35649342 PMCID: PMC9294420 DOI: 10.1093/bib/bbac199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/10/2022] [Accepted: 04/29/2022] [Indexed: 11/14/2022] Open

González JM. Visualizing the superfamily of metallo-β-lactamases through sequence similarity network neighborhood connectivity analysis. Heliyon 2021;7:e05867. [PMID: 33426353 PMCID: PMC7785958 DOI: 10.1016/j.heliyon.2020.e05867] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 11/19/2020] [Accepted: 12/23/2020] [Indexed: 12/13/2022] Open

Rosen MR, Leuthaeuser JB, Parish CA, Fetrow JS. Isofunctional Clustering and Conformational Analysis of the Arsenate Reductase Superfamily Reveals Nine Distinct Clusters. Biochemistry 2020;59:4262-4284. [PMID: 33135415 DOI: 10.1021/acs.biochem.0c00651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Arsenate reductase (ArsC) is a superfamily of enzymes that reduce arsenate. Due to active site similarities, some ArsC can function as low-molecular weight protein tyrosine phosphatases (LMW-PTPs). Broad superfamily classifications align with redox partners (Trx- or Grx-linked). To understand this superfamily's mechanistic diversity, the ArsC superfamily is classified on the basis of active site features utilizing the tools TuLIP (two-level iterative clustering process) and autoMISST (automated multilevel iterative sequence searching technique). This approach identified nine functionally relevant (perhaps isofunctional) protein groups. Five groups exhibit distinct ArsC mechanisms. Three are Grx-linked: group 4AA (classical ArsC), group 3AAA (YffB-like), and group 5BAA. Two are Trx-linked: groups 6AAAAA and 7AAAAAAAA. One is an Spx-like transcriptional regulatory group, group 5AAA. Three are potential LMW-PTP groups: groups 7BAAAA, and 7AAAABAA, which have not been previously identified, and the well-studied LMW-PTP family group 8AAA. Molecular dynamics simulations were utilized to explore functional site details. In several families, we confirm and add detail to literature-based mechanistic information. Mechanistic roles are hypothesized for conserved active site residues in several families. In three families, simulations of the unliganded structure sample specific conformational ensembles, which are proposed to represent either a more ligand-binding-competent conformation or a pathway toward a more binding-competent state; these active sites may be designed to traverse high-energy barriers to the lower-energy conformations necessary to more readily bind ligands. This more detailed biochemical understanding of ArsC and ArsC-like PTP mechanisms opens possibilities for further understanding of arsenate bioremediation and the LMW-PTP mechanism.

Collapse

Mazmanian K, Sargsyan K, Lim C. How the Local Environment of Functional Sites Regulates Protein Function. J Am Chem Soc 2020;142:9861-9871. [PMID: 32407086 DOI: 10.1021/jacs.0c02430] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Fetrow JS, Babbitt PC. New computational approaches to understanding molecular protein function. PLoS Comput Biol 2018;14:e1005756. [PMID: 29621256 PMCID: PMC5886384 DOI: 10.1371/journal.pcbi.1005756] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Holliday GL, Brown SD, Akiva E, Mischel D, Hicks MA, Morris JH, Huang CC, Meng EC, Pegg SCH, Ferrin TE, Babbitt PC. Biocuration in the structure-function linkage database: the anatomy of a superfamily. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017;2017:3074783. [PMID: 28365730 PMCID: PMC5467563 DOI: 10.1093/database/bax006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/23/2017] [Indexed: 12/11/2022]

Knutson ST, Westwood BM, Leuthaeuser JB, Turner BE, Nguyendac D, Shea G, Kumar K, Hayden JD, Harper AF, Brown SD, Morris JH, Ferrin TE, Babbitt PC, Fetrow JS. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences. Protein Sci 2017;26:677-699. [PMID: 28054422 PMCID: PMC5368075 DOI: 10.1002/pro.3112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 12/22/2016] [Indexed: 01/11/2023]

Abstract

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.

Collapse

Harper AF, Leuthaeuser JB, Babbitt PC, Morris JH, Ferrin TE, Poole LB, Fetrow JS. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins. PLoS Comput Biol 2017;13:e1005284. [PMID: 28187133 PMCID: PMC5302317 DOI: 10.1371/journal.pcbi.1005284] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 12/06/2016] [Indexed: 12/15/2022] Open

Abstract

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.

Collapse

Leuthaeuser JB, Morris JH, Harper AF, Ferrin TE, Babbitt PC, Fetrow JS. DASP3: identification of protein sequences belonging to functionally relevant groups. BMC Bioinformatics 2016;17:458. [PMID: 27835946 PMCID: PMC5106842 DOI: 10.1186/s12859-016-1295-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 10/20/2016] [Indexed: 01/26/2023] Open

Abstract

Background

Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can’t keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP.

Results

The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2.

Conclusions

DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1295-z) contains supplementary material, which is available to authorized users.

Collapse

Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016;128:85-99. [PMID: 27697476 DOI: 10.1016/j.pbiomolbio.2016.09.009] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/05/2016] [Accepted: 09/26/2016] [Indexed: 10/20/2022]

Comparing atom-based with residue-based descriptors in predicting binding site similarity: do backbone atoms matter? Future Med Chem 2016;8:1871-1885. [PMID: 27629811 DOI: 10.4155/fmc-2016-0077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Poole LB, Nelson KJ. Distribution and Features of the Six Classes of Peroxiredoxins. Mol Cells 2016;39:53-9. [PMID: 26810075 PMCID: PMC4749874 DOI: 10.14348/molcells.2016.2330] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2015] [Accepted: 12/09/2015] [Indexed: 12/03/2022] Open