1
|
Etcheverry M, Moulin-Frier C, Oudeyer PY, Levin M. AI-driven automated discovery tools reveal diverse behavioral competencies of biological networks. eLife 2025; 13:RP92683. [PMID: 39804159 PMCID: PMC11729405 DOI: 10.7554/elife.92683] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2025] Open
Abstract
Many applications in biomedicine and synthetic bioengineering rely on understanding, mapping, predicting, and controlling the complex behavior of chemical and genetic networks. The emerging field of diverse intelligence investigates the problem-solving capacities of unconventional agents. However, few quantitative tools exist for exploring the competencies of non-conventional systems. Here, we view gene regulatory networks (GRNs) as agents navigating a problem space and develop automated tools to map the robust goal states GRNs can reach despite perturbations. Our contributions include: (1) Adapting curiosity-driven exploration algorithms from AI to discover the range of reachable goal states of GRNs, and (2) Proposing empirical tests inspired by behaviorist approaches to assess their navigation competencies. Our data shows that models inferred from biological data can reach a wide spectrum of steady states, exhibiting various competencies in physiological network dynamics without requiring structural changes in network properties or connectivity. We also explore the applicability of these 'behavioral catalogs' for comparing evolved competencies across biological networks, for designing drug interventions in biomedical contexts and synthetic gene networks for bioengineering. These tools and the emphasis on behavior-shaping open new paths for efficiently exploring the complex behavior of biological networks. For the interactive version of this paper, please visit https://developmentalsystems.org/curious-exploration-of-grn-competencies.
Collapse
Affiliation(s)
| | | | | | - Michael Levin
- Allen Discovery Center, Tufts UniversityMedfordUnited States
| |
Collapse
|
2
|
Hornung BVH, Terrapon N. An objective criterion to evaluate sequence-similarity networks helps in dividing the protein family sequence space. PLoS Comput Biol 2023; 19:e1010881. [PMID: 37585436 PMCID: PMC10461819 DOI: 10.1371/journal.pcbi.1010881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 08/28/2023] [Accepted: 01/18/2023] [Indexed: 08/18/2023] Open
Abstract
The deluge of genomic data raises various challenges for computational protein annotation. The definition of superfamilies, based on conserved folds, or of families, showing more recent homology signatures, allow a first categorization of the sequence space. However, for precise functional annotation or the identification of the unexplored parts within a family, a division into subfamilies is essential. As curators of an expert database, the Carbohydrate Active Enzymes database (CAZy), we began, more than 15 years ago, to manually define subfamilies based on phylogeny reconstruction. However, facing the increasing amount of sequence and functional data, we required more scalable and reproducible methods. The recently popularized sequence similarity networks (SSNs), allows to cope with very large families and computation of many subfamily schemes. Still, the choice of the optimal SSN subfamily scheme only relies on expert knowledge so far, without any data-driven guidance from within the network. In this study, we therefore decided to investigate several network properties to determine a criterion which can be used by curators to evaluate the quality of subfamily assignments. The performance of the closeness centrality criterion, a network property to indicate the connectedness within the network, shows high similarity to the decisions of expert curators from eight distinct protein families. Closeness centrality also suggests that in some cases multiple levels of subfamilies could be possible, depending on the granularity of the research question, while it indicates when no subfamily emerged in some family evolution. We finally used closeness centrality to create subfamilies in four families of the CAZy database, providing a finer functional annotation and highlighting subfamilies without biochemically characterized members for potential future discoveries.
Collapse
Affiliation(s)
| | - Nicolas Terrapon
- Aix Marseille Université, CNRS, UMR 7257 AFMB, Marseille, France
- INRAE, USC 1408 AFMB, Marseille, France
| |
Collapse
|
3
|
Verma R, Raj S, Berry U, Ranjith-Kumar CT, Surjit M. Drug Repurposing for COVID-19 Therapy: Pipeline, Current Status and Challenges. DRUG REPURPOSING FOR EMERGING INFECTIOUS DISEASES AND CANCER 2023:451-478. [DOI: 10.1007/978-981-19-5399-6_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
Biswas S, Clawson W, Levin M. Learning in Transcriptional Network Models: Computational Discovery of Pathway-Level Memory and Effective Interventions. Int J Mol Sci 2022; 24:285. [PMID: 36613729 PMCID: PMC9820177 DOI: 10.3390/ijms24010285] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/23/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022] Open
Abstract
Trainability, in any substrate, refers to the ability to change future behavior based on past experiences. An understanding of such capacity within biological cells and tissues would enable a particularly powerful set of methods for prediction and control of their behavior through specific patterns of stimuli. This top-down mode of control (as an alternative to bottom-up modification of hardware) has been extensively exploited by computer science and the behavioral sciences; in biology however, it is usually reserved for organism-level behavior in animals with brains, such as training animals towards a desired response. Exciting work in the field of basal cognition has begun to reveal degrees and forms of unconventional memory in non-neural tissues and even in subcellular biochemical dynamics. Here, we characterize biological gene regulatory circuit models and protein pathways and find them capable of several different kinds of memory. We extend prior results on learning in binary transcriptional networks to continuous models and identify specific interventions (regimes of stimulation, as opposed to network rewiring) that abolish undesirable network behavior such as drug pharmacoresistance and drug sensitization. We also explore the stability of created memories by assessing their long-term behavior and find that most memories do not decay over long time periods. Additionally, we find that the memory properties are quite robust to noise; surprisingly, in many cases noise actually increases memory potential. We examine various network properties associated with these behaviors and find that no one network property is indicative of memory. Random networks do not show similar memory behavior as models of biological processes, indicating that generic network dynamics are not solely responsible for trainability. Rational control of dynamic pathway function using stimuli derived from computational models opens the door to empirical studies of proto-cognitive capacities in unconventional embodiments and suggests numerous possible applications in biomedicine, where behavior shaping of pathway responses stand as a potential alternative to gene therapy.
Collapse
Affiliation(s)
- Surama Biswas
- Allen Discovery Center, Tufts University, Medford, MA 02155, USA
- Department of Computer Science & Engineering and Information Technology, Meghnad Saha Institute of Technology, Kolkata 700150, India
| | - Wesley Clawson
- Allen Discovery Center, Tufts University, Medford, MA 02155, USA
| | - Michael Levin
- Allen Discovery Center, Tufts University, Medford, MA 02155, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
5
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
6
|
Sherill-Rofe D, Raban O, Findlay S, Rahat D, Unterman I, Samiei A, Yasmeen A, Kaiser Z, Kuasne H, Park M, Foulkes WD, Bloch I, Zick A, Gotlieb WH, Tabach Y, Orthwein A. Multi-omics data integration analysis identifies the spliceosome as a key regulator of DNA double-strand break repair. NAR Cancer 2022; 4:zcac013. [PMID: 35399185 PMCID: PMC8991968 DOI: 10.1093/narcan/zcac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 02/25/2022] [Accepted: 03/23/2022] [Indexed: 11/14/2022] Open
Abstract
DNA repair by homologous recombination (HR) is critical for the maintenance of genome stability. Germline and somatic mutations in HR genes have been associated with an increased risk of developing breast (BC) and ovarian cancers (OvC). However, the extent of factors and pathways that are functionally linked to HR with clinical relevance for BC and OvC remains unclear. To gain a broader understanding of this pathway, we used multi-omics datasets coupled with machine learning to identify genes that are associated with HR and to predict their sub-function. Specifically, we integrated our phylogenetic-based co-evolution approach (CladePP) with 23 distinct genetic and proteomic screens that monitored, directly or indirectly, DNA repair by HR. This omics data integration analysis yielded a new database (HRbase) that contains a list of 464 predictions, including 76 gold standard HR genes. Interestingly, the spliceosome machinery emerged as one major pathway with significant cross-platform interactions with the HR pathway. We functionally validated 6 spliceosome factors, including the RNA helicase SNRNP200 and its co-factor SNW1. Importantly, their RNA expression correlated with BC/OvC patient outcome. Altogether, we identified novel clinically relevant DNA repair factors and delineated their specific sub-function by machine learning. Our results, supported by evolutionary and multi-omics analyses, suggest that the spliceosome machinery plays an important role during the repair of DNA double-strand breaks (DSBs).
Collapse
Affiliation(s)
- Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Oded Raban
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Steven Findlay
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Dolev Rahat
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Irene Unterman
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Arash Samiei
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Amber Yasmeen
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Zafir Kaiser
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Hellen Kuasne
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Morag Park
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - William D Foulkes
- The Research Institute of the McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Aviad Zick
- Department of Oncology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Ein-Kerem, Jerusalem 91120, Israel
| | - Walter H Gotlieb
- Division of Gynecology Oncology, Segal Cancer Center, Jewish General Hospital, McGill University, Montreal, QC H3T 1E2, Canada
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Alexandre Orthwein
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| |
Collapse
|
7
|
Janaki C, Gowri VS, Srinivasan N. Master Blaster: an approach to sensitive identification of remotely related proteins. Sci Rep 2021; 11:8746. [PMID: 33888741 PMCID: PMC8062480 DOI: 10.1038/s41598-021-87833-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 04/06/2021] [Indexed: 11/11/2022] Open
Abstract
Genome sequencing projects unearth sequences of all the protein sequences encoded in a genome. As the first step, homology detection is employed to obtain clues to structure and function of these proteins. However, high evolutionary divergence between homologous proteins challenges our ability to detect distant relationships. In the past, an approach involving multiple Position Specific Scoring Matrices (PSSMs) was found to be more effective than traditional single PSSMs. Cascaded search is another successful approach where hits of a search are queried to detect more homologues. We propose a protocol, ‘Master Blaster’, which combines the principles adopted in these two approaches to enhance our ability to detect remote homologues even further. Assessment of the approach was performed using known relationships available in the SCOP70 database, and the results were compared against that of PSI-BLAST and HHblits, a hidden Markov model-based method. Compared to PSI-BLAST, Master Blaster resulted in 10% improvement with respect to detection of cross superfamily connections, nearly 35% improvement in cross family and more than 80% improvement in intra family connections. From the results it was observed that HHblits is more sensitive in detecting remote homologues compared to Master Blaster. However, there are true hits from 46-folds for which Master Blaster reported homologs that are not reported by HHblits even using the optimal parameters indicating that for detecting remote homologues, use of multiple methods employing a combination of different approaches can be more effective in detecting remote homologs. Master Blaster stand-alone code is available for download in the supplementary archive.
Collapse
Affiliation(s)
- Chintalapati Janaki
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Centre for Development of Advanced Computing, Knowledge Park, Byappanahalli, Bangalore, 560038, India
| | - Venkatraman S Gowri
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Department of Chemistry, Auxilium College, Gandhinagar, Vellore, 632006, India
| | | |
Collapse
|
8
|
Grear T, Avery C, Patterson J, Jacobs DJ. Molecular function recognition by supervised projection pursuit machine learning. Sci Rep 2021; 11:4247. [PMID: 33608593 PMCID: PMC7895977 DOI: 10.1038/s41598-021-83269-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 01/28/2021] [Indexed: 01/31/2023] Open
Abstract
Identifying mechanisms that control molecular function is a significant challenge in pharmaceutical science and molecular engineering. Here, we present a novel projection pursuit recurrent neural network to identify functional mechanisms in the context of iterative supervised machine learning for discovery-based design optimization. Molecular function recognition is achieved by pairing experiments that categorize systems with digital twin molecular dynamics simulations to generate working hypotheses. Feature extraction decomposes emergent properties of a system into a complete set of basis vectors. Feature selection requires signal-to-noise, statistical significance, and clustering quality to concurrently surpass acceptance levels. Formulated as a multivariate description of differences and similarities between systems, the data-driven working hypothesis is refined by analyzing new systems prioritized by a discovery-likelihood. Utility and generality are demonstrated on several benchmarks, including the elucidation of antibiotic resistance in TEM-52 beta-lactamase. The software is freely available, enabling turnkey analysis of massive data streams found in computational biology and material science.
Collapse
Affiliation(s)
- Tyler Grear
- grid.266859.60000 0000 8598 2218Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28262 USA
| | - Chris Avery
- grid.266859.60000 0000 8598 2218Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28262 USA ,grid.266859.60000 0000 8598 2218Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28262 USA
| | - John Patterson
- grid.266859.60000 0000 8598 2218Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28262 USA
| | - Donald J. Jacobs
- grid.266859.60000 0000 8598 2218Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28262 USA ,grid.266859.60000 0000 8598 2218Center for Biomedical Engineering and Science, University of North Carolina at Charlotte, Charlotte, NC 28262 USA
| |
Collapse
|
9
|
Rosen MR, Leuthaeuser JB, Parish CA, Fetrow JS. Isofunctional Clustering and Conformational Analysis of the Arsenate Reductase Superfamily Reveals Nine Distinct Clusters. Biochemistry 2020; 59:4262-4284. [PMID: 33135415 DOI: 10.1021/acs.biochem.0c00651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Arsenate reductase (ArsC) is a superfamily of enzymes that reduce arsenate. Due to active site similarities, some ArsC can function as low-molecular weight protein tyrosine phosphatases (LMW-PTPs). Broad superfamily classifications align with redox partners (Trx- or Grx-linked). To understand this superfamily's mechanistic diversity, the ArsC superfamily is classified on the basis of active site features utilizing the tools TuLIP (two-level iterative clustering process) and autoMISST (automated multilevel iterative sequence searching technique). This approach identified nine functionally relevant (perhaps isofunctional) protein groups. Five groups exhibit distinct ArsC mechanisms. Three are Grx-linked: group 4AA (classical ArsC), group 3AAA (YffB-like), and group 5BAA. Two are Trx-linked: groups 6AAAAA and 7AAAAAAAA. One is an Spx-like transcriptional regulatory group, group 5AAA. Three are potential LMW-PTP groups: groups 7BAAAA, and 7AAAABAA, which have not been previously identified, and the well-studied LMW-PTP family group 8AAA. Molecular dynamics simulations were utilized to explore functional site details. In several families, we confirm and add detail to literature-based mechanistic information. Mechanistic roles are hypothesized for conserved active site residues in several families. In three families, simulations of the unliganded structure sample specific conformational ensembles, which are proposed to represent either a more ligand-binding-competent conformation or a pathway toward a more binding-competent state; these active sites may be designed to traverse high-energy barriers to the lower-energy conformations necessary to more readily bind ligands. This more detailed biochemical understanding of ArsC and ArsC-like PTP mechanisms opens possibilities for further understanding of arsenate bioremediation and the LMW-PTP mechanism.
Collapse
Affiliation(s)
- Mikaela R Rosen
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Janelle B Leuthaeuser
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Carol A Parish
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Jacquelyn S Fetrow
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| |
Collapse
|
10
|
MacDougall A, Volynkin V, Saidi R, Poggioli D, Zellner H, Hatton-Ellis E, Joshi V, O’Donovan C, Orchard S, Auchincloss AH, Baratin D, Bolleman J, Coudert E, de Castro E, Hulo C, Masson P, Pedruzzi I, Rivoire C, Arighi C, Wang Q, Chen C, Huang H, Garavelli J, Vinayaka CR, Yeh LS, Natale DA, Laiho K, Martin MJ, Renaux A, Pichler K. UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase. Bioinformatics 2020; 36:4643-4648. [PMID: 32399560 PMCID: PMC7750954 DOI: 10.1093/bioinformatics/btaa485] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Collaborators] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/13/2020] [Accepted: 05/05/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.
Collapse
Affiliation(s)
- Alistair MacDougall
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vladimir Volynkin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Diego Poggioli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Kantar Consulting, Casalecchio Di Reno, 40033 Bologna, Italy
| | - Hermann Zellner
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emma Hatton-Ellis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vishal Joshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire O’Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrea H Auchincloss
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Delphine Baratin
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Jerven Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Coudert
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Edouard de Castro
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Chantal Hulo
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Patrick Masson
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Ivo Pedruzzi
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Catherine Rivoire
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Cecilia Arighi
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Qinghua Wang
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Chuming Chen
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Hongzhan Huang
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - John Garavelli
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - C R Vinayaka
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Lai-Su Yeh
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Kati Laiho
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandre Renaux
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Klemens Pichler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
Collaborators
Alex Bateman, Alan Bridge, Cathy Wu, Cecilia Arighi, Lionel Breuza, Elisabeth Coudert, Hongzhan Huang, Damien Lieberherr, Michele Magrane, Maria J Martin, Peter McGarvey, Darren Natale, Sandra Orchard, Ivo Pedruzzi, Sylvain Poux, Manuela Pruess, Shriya Raj, Nicole Redaschi, Lucila Aimo, Ghislaine Argoud-Puy, Andrea Auchincloss, Kristian Axelsen, Emmanuel Boutet, Emily Bowler, Ramona Britto, Hema Bye-A-Jee, Cristina Casals-Casas, Paul Denny, Anne Estreicher, Maria Livia Famiglietti, Marc Feuermann, John S Garavelli, Penelope Garmiri, Arnaud Gos, Nadine Gruaz, Emma Hatton-Ellis, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Kati Laiho, Philippe Le Mercier, Antonia Lock, Yvonne Lussi, Alistair MacDougall, Patrick Masson, Anne Morgat, Sandrine Pilbout, Lucille Pourcel, Catherine Rivoire, Karen Ross, Christian Sigrist, Elena Speretta, Shyamala Sundaram, Nidhi Tyagi, C R Vinayaka, Qinghua Wang, Kate Warner, Lai-Su Yeh, Rossana Zaru, Shadab Ahmed, Emanuele Alpi, Leslie Arminski, Parit Bansal, Delphine Baratin, Teresa Batista Neto, Jerven Bolleman, Chuming Chen, Yongxing Chen, Beatrice Cuche, Austra Cukura, Edouard De Castro, ThankGod Ebenezer, Elisabeth Gasteiger, Sebastien Gehant, Leonardo Gonzales, Abdulrahman Hussein, Alexandr Ignatchenko, Giuseppe Insana, Rizwan Ishtiaq, Vishal Joshi, Dushyanth Jyothi, Arnaud Kerhornou, Thierry Lombardot, Aurelian Luciani, Jie Luo, Mahdi Mahmoudy, Alok Mishra, Katie Moulang, Andrew Nightingale, Joseph Onwubiko, Monica Pozzato, Sangya Pundir, Guoying Qi, Daniel Rice, Rabie Saidi, Edward Turner, Preethi Vasudev, Yuqi Wang, Xavier Watkins, Hermann Zellner, Jian Zhang,
Collapse
|
11
|
Jin S, Chen M, Chen X, Bueno C, Lu W, Schafer NP, Lin X, Onuchic JN, Wolynes PG. Protein Structure Prediction in CASP13 Using AWSEM-Suite. J Chem Theory Comput 2020; 16:3977-3988. [PMID: 32396727 DOI: 10.1021/acs.jctc.0c00188] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Recently several techniques have emerged that significantly enhance the quality of predictions of protein tertiary structures. In this study, we describe the performance of AWSEM-Suite, an algorithm that incorporates template-based modeling and coevolutionary restraints with a realistic coarse-grained force field, AWSEM. With its roots in neural networks, AWSEM contains both physical and bioinformatical energies that have been optimized using energy landscape theory. AWSEM-Suite participated in CASP13 as a server predictor and generated reliable predictions for most targets. AWSEM-Suite ranked eighth in both the free-modeling category and the hard-to-model category and in one case provided the best submitted prediction. Here we critically discuss the prediction performance of AWSEM-Suite using several examples from different categories in CASP13. Structure prediction tests on these selected targets, two of them being hard-to-model targets, show that AWSEM-Suite can achieve high-resolution structure prediction after incorporating both template guidances and coevolutionary restraints even when homology is weak. For targets with reliable templates (template-easy category), introducing coevolutionary restraints sometimes damages the overall quality of the predictions. Free energy profile analyses demonstrate, however, that the incorporations of both of these evolutionarily informed terms effectively increase the funneling of the landscape toward native-like structures while still allowing sufficient flexibility to correct for discrepancies between the correct target structure and the provided guidance. In contrast to other predictors that are exclusively oriented toward structure prediction, the connection of AWSEM-Suite to a statistical mechanical basis and affiliated molecular dynamics and importance sampling simulations makes it suitable for functional explorations.
Collapse
Affiliation(s)
| | | | - Xun Chen
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | | | - Wei Lu
- Department of Physics, Rice University, Houston, Texas 77005, United States
| | | | - Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - José N Onuchic
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Physics, Rice University, Houston, Texas 77005, United States
| | - Peter G Wolynes
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Physics, Rice University, Houston, Texas 77005, United States
| |
Collapse
|