1
|
Kalakoti Y, Sanjeev A, Wallner B. Prediction of structural variation. Curr Opin Struct Biol 2025; 91:103003. [PMID: 39983409 DOI: 10.1016/j.sbi.2025.103003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 01/15/2025] [Accepted: 01/26/2025] [Indexed: 02/23/2025]
Abstract
Proteins are dynamic molecules that transition between conformational states to perform their functions, and characterizing the protein ensemble is important for understanding biology and therapeutic applications. While recent breakthroughs in machine learning have enabled the prediction of high-quality static models of individual proteins, generating reliable estimates of their conformational ensembles remains a challenge. Several recent methods have tried to utilize the evolutionary and structural features captured by effective sequence-to-structure models to enhance conformational diversity in generated models. Most of these approaches involve adapting existing inference pipelines, such as AlphaFold 2, combined with sampling techniques to induce the generation of diverse conformational states. Here, we describe the general problem of predicting structural variations in protein systems, explain the methods designed to address this challenge, explore why they are effective, discuss their limitations, and suggest potential future directions.
Collapse
Affiliation(s)
- Yogesh Kalakoti
- Linköping University, Division of Bioinformatics, Department of Physics, Chemistry and Biolog, Linköping, 58183, Sweden
| | - Airy Sanjeev
- Linköping University, Division of Bioinformatics, Department of Physics, Chemistry and Biolog, Linköping, 58183, Sweden
| | - Björn Wallner
- Linköping University, Division of Bioinformatics, Department of Physics, Chemistry and Biolog, Linköping, 58183, Sweden.
| |
Collapse
|
2
|
Das A, Cheng H, Wang Y, Kinch LN, Liang G, Hong S, Hobbs HH, Cohen JC. The ubiquitin E3 ligase BFAR promotes degradation of PNPLA3. Proc Natl Acad Sci U S A 2024; 121:e2312291121. [PMID: 38294943 PMCID: PMC10861911 DOI: 10.1073/pnas.2312291121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 12/26/2023] [Indexed: 02/02/2024] Open
Abstract
A missense variant in patatin-like phospholipase domain-containing protein 3 [PNPLA3(I148M)] is the most impactful genetic risk factor for fatty liver disease (FLD). We previously showed that PNPLA3 is ubiquitylated and subsequently degraded by proteasomes and autophagosomes and that the PNPLA3(148M) variant interferes with this process. To define the machinery responsible for PNPLA3 turnover, we used small interfering (si)RNAs to inactivate components of the ubiquitin proteasome system. Inactivation of bifunctional apoptosis regulator (BFAR), a membrane-bound E3 ubiquitin ligase, reproducibly increased PNPLA3 levels in two lines of cultured hepatocytes. Conversely, overexpression of BFAR decreased levels of endogenous PNPLA3 in HuH7 cells. BFAR and PNPLA3 co-immunoprecipitated when co-expressed in cells. BFAR promoted ubiquitylation of PNPLA3 in vitro in a reconstitution assay using purified, epitope-tagged recombinant proteins. To confirm that BFAR targets PNPLA3, we inactivated Bfar in mice. Levels of PNPLA3 protein were increased twofold in hepatic lipid droplets of Bfar-/- mice with no associated increase in PNPLA3 mRNA levels. Taken together these data are consistent with a model in which BFAR plays a role in the post-translational degradation of PNPLA3. The identification of BFAR provides a potential target to enhance PNPLA3 turnover and prevent FLD.
Collapse
Affiliation(s)
- Avash Das
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Haili Cheng
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Yang Wang
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Lisa N. Kinch
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Guosheng Liang
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Sen Hong
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Helen H. Hobbs
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jonathan C. Cohen
- Department of Molecular Genetics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Center for Human Nutrition, University of Texas Southwestern Medical Center, Dallas, TX75390
| |
Collapse
|
3
|
Kinch LN, Schaeffer RD, Zhang J, Cong Q, Orth K, Grishin N. Insights into virulence: structure classification of the Vibrio parahaemolyticus RIMD mobilome. mSystems 2023; 8:e0079623. [PMID: 38014954 PMCID: PMC10734457 DOI: 10.1128/msystems.00796-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 10/17/2023] [Indexed: 11/29/2023] Open
Abstract
IMPORTANCE The pandemic Vpar strain RIMD causes seafood-borne illness worldwide. Previous comparative genomic studies have revealed pathogenicity islands in RIMD that contribute to the success of the strain in infection. However, not all virulence determinants have been identified, and many of the proteins encoded in known pathogenicity islands are of unknown function. Based on the EOCD database, we used evolution-based classification of structure models for the RIMD proteome to improve our functional understanding of virulence determinants acquired by the pandemic strain. We further identify and classify previously unknown mobile protein domains as well as fast evolving residue positions in structure models that contribute to virulence and adaptation with respect to a pre-pandemic strain. Our work highlights key contributions of phage in mediating seafood born illness, suggesting this strain balances its avoidance of phage predators with its successful colonization of human hosts.
Collapse
Affiliation(s)
- Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Kim Orth
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Nick Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
4
|
Das R, Kretsch RC, Simpkin AJ, Mulvaney T, Pham P, Rangan R, Bu F, Keegan RM, Topf M, Rigden DJ, Miao Z, Westhof E. Assessment of three-dimensional RNA structure prediction in CASP15. Proteins 2023; 91:1747-1770. [PMID: 37876231 PMCID: PMC10841292 DOI: 10.1002/prot.26602] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/21/2023] [Accepted: 09/07/2023] [Indexed: 10/26/2023]
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, CA USA
- Biophysics Program, Stanford University School of Medicine, CA USA
- Howard Hughes Medical Institute, Stanford University, CA USA
| | | | - Adam J. Simpkin
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Thomas Mulvaney
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV), Hamburg, Germany
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Phillip Pham
- Department of Biochemistry, Stanford University School of Medicine, CA USA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of Medicine, CA USA
| | - Fan Bu
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou 510005, China
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230036, Anhui, China
| | - Ronan M. Keegan
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
- Life Science, Diamond Light Source, Harwell Science, UK
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV), Hamburg, Germany
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Daniel J. Rigden
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai 200434, China
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, F-67084, Strasbourg, France
| |
Collapse
|
5
|
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Tertiary structure assessment at CASP15. Proteins 2023; 91:1616-1635. [PMID: 37746927 PMCID: PMC10792517 DOI: 10.1002/prot.26593] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/25/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - Shahram Mesdaghi
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Computational Biology Facility, MerseyBio, University of LiverpoolLiverpoolUK
| | - Filomeno Sánchez Rodríguez
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Life Science, Diamond Light Source, Harwell Science and Innovation CampusOxfordshireUK
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUK
| | - Luc Elliott
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - David L. Murphy
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | | | - Ronan M. Keegan
- UKRI‐STFC, Rutherford Appleton Laboratory, Research Complex at HarwellDidcotUK
| | - Daniel J. Rigden
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| |
Collapse
|
6
|
Ohmuro-Matsuyama Y, Matsui H, Kanai M, Furuta T. Glow-type conversion and characterization of a minimal luciferase via mutational analyses. FEBS J 2023; 290:5554-5565. [PMID: 37622174 DOI: 10.1111/febs.16937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/20/2023] [Accepted: 08/22/2023] [Indexed: 08/26/2023]
Abstract
Luciferases are widely used as reporter proteins in various fields. Recently, we developed a minimal bright luciferase, picALuc, via partial deletion of the artificial luciferase (ALuc) derived from copepods luciferases. However, the structures of copepod luciferases in the substrate-bound state remain unknown. Moreover, as suggested by structural modeling, picALuc has a larger active site cavity, unlike that in other copepod luciferases. Here, to explore the bioluminescence mechanism of picALuc and its luminescence properties, we conducted multiple mutational analyses, and identified residues and regions important for catalysis and bioluminescence. Mutations of residues likely involved in catalysis (S33, H34, and D55) markedly reduced bioluminescence, whereas that of residue (E50) (near the substrate in the structural model) enhanced luminescence intensity. Furthermore, deletion mutants (Δ70-Δ78) in the loop region (around I73) exhibited longer luminescence lifetimes (~ 30 min) and were reactivated multiple times upon re-addition of the substrate. Due to the high thermostability of picALuc, one of its representative mutant (Δ74), was able to be reused, that is, luminescence recycling, for day-scale time at room temperature. These findings provide important insights into picALuc bioluminescence mechanism and copepod luciferases and may help with sustained observations in a variety of applications.
Collapse
Affiliation(s)
| | - Hayato Matsui
- Technology Research Laboratory, Shimadzu Corporation, Kyoto, Japan
| | - Masaki Kanai
- Technology Research Laboratory, Shimadzu Corporation, Kyoto, Japan
| | - Tadaomi Furuta
- School of Life Science and Technology, Tokyo Institute of Technology, Yokohama, Japan
| |
Collapse
|
7
|
Das R, Kretsch RC, Simpkin AJ, Mulvaney T, Pham P, Rangan R, Bu F, Keegan RM, Topf M, Rigden DJ, Miao Z, Westhof E. Assessment of three-dimensional RNA structure prediction in CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.25.538330. [PMID: 37162955 PMCID: PMC10168427 DOI: 10.1101/2023.04.25.538330] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and X-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as non-canonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, CA USA
- Biophysics Program, Stanford University School of Medicine, CA USA
- Howard Hughes Medical Institute, Stanford University, CA USA
| | | | - Adam J. Simpkin
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Thomas Mulvaney
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV)
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Phillip Pham
- Department of Biochemistry, Stanford University School of Medicine, CA USA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of Medicine, CA USA
| | - Fan Bu
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou 510005, China
- Division of Life Sciences and Medicine,University of Science and Technology of China, Hefei 230036, Anhui, China
| | - Ronan M. Keegan
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
- Life Science, Diamond Light Source, Harwell Science, UK
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV)
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Daniel J. Rigden
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai 200434, China
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, F-67084, Strasbourg, France
| |
Collapse
|
8
|
Chen X, Morehead A, Liu J, Cheng J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023; 39:i308-i317. [PMID: 37387159 PMCID: PMC10311325 DOI: 10.1093/bioinformatics/btad203] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. RESULTS In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. AVAILABILITY AND IMPLEMENTATION The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| |
Collapse
|
9
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
10
|
Schaeffer RD, Zhang J, Kinch LN, Pei J, Cong Q, Grishin NV. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci U S A 2023; 120:e2214069120. [PMID: 36917664 PMCID: PMC10041065 DOI: 10.1073/pnas.2214069120] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 02/06/2023] [Indexed: 03/16/2023] Open
Abstract
Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jimin Pei
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX75390
| |
Collapse
|
11
|
Jeong KJ, Jeong S, Lee S, Son CY. Predictive Molecular Models for Charged Materials Systems: From Energy Materials to Biomacromolecules. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2204272. [PMID: 36373701 DOI: 10.1002/adma.202204272] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/05/2022] [Indexed: 06/16/2023]
Abstract
Electrostatic interactions play a dominant role in charged materials systems. Understanding the complex correlation between macroscopic properties with microscopic structures is of critical importance to develop rational design strategies for advanced materials. But the complexity of this challenging task is augmented by interfaces present in the charged materials systems, such as electrode-electrolyte interfaces or biological membranes. Over the last decades, predictive molecular simulations that are founded in fundamental physics and optimized for charged interfacial systems have proven their value in providing molecular understanding of physicochemical properties and functional mechanisms for diverse materials. Novel design strategies utilizing predictive models have been suggested as promising route for the rational design of materials with tailored properties. Here, an overview of recent advances in the understanding of charged interfacial systems aided by predictive molecular simulations is presented. Focusing on three types of charged interfaces found in energy materials and biomacromolecules, how the molecular models characterize ion structure, charge transport, morphology relation to the environment, and the thermodynamics/kinetics of molecular binding at the interfaces is discussed. The critical analysis brings two prominent field of energy materials and biological science under common perspective, to stimulate crossover in both research field that have been largely separated.
Collapse
Affiliation(s)
- Kyeong-Jun Jeong
- Department of Chemistry, Pohang University of Science and Technology (POSTECH), Pohang, 790-784, South Korea
| | - Seungwon Jeong
- Department of Chemistry, Pohang University of Science and Technology (POSTECH), Pohang, 790-784, South Korea
| | - Sangmin Lee
- Department of Chemistry, Pohang University of Science and Technology (POSTECH), Pohang, 790-784, South Korea
| | - Chang Yun Son
- Department of Chemistry, Pohang University of Science and Technology (POSTECH), Pohang, 790-784, South Korea
| |
Collapse
|
12
|
Heo L, Feig M. Multi-state modeling of G-protein coupled receptors at experimental accuracy. Proteins 2022; 90:1873-1885. [PMID: 35510704 PMCID: PMC9561049 DOI: 10.1002/prot.26382] [Citation(s) in RCA: 113] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 04/07/2022] [Accepted: 04/26/2022] [Indexed: 12/30/2022]
Abstract
The family of G-protein coupled receptors (GPCRs) is one of the largest protein families in the human genome. GPCRs transduct chemical signals from extracellular to intracellular regions via a conformational switch between active and inactive states upon ligand binding. While experimental structures of GPCRs remain limited, high-accuracy computational predictions are now possible with AlphaFold2. However, AlphaFold2 only predicts one state and is biased toward either the active or inactive conformation depending on the GPCR class. Here, a multi-state prediction protocol is introduced that extends AlphaFold2 to predict either active or inactive states at very high accuracy using state-annotated templated GPCR databases. The predicted models accurately capture the main structural changes upon activation of the GPCR at the atomic level. For most of the benchmarked GPCRs (10 out of 15), models in the active and inactive states were closer to their corresponding activation state structures. Median RMSDs of the transmembrane regions were 1.12 Å and 1.41 Å for the active and inactive state models, respectively. The models were more suitable for protein-ligand docking than the original AlphaFold2 models and template-based models. Finally, our prediction protocol predicted accurate GPCR structures and GPCR-peptide complex structures in GPCR Dock 2021, a blind GPCR-ligand complex modeling competition. We expect that high accuracy GPCR models in both activation states will promote understanding in GPCR activation mechanisms and drug discovery for GPCRs. At the time, the new protocol paves the way towards capturing the dynamics of proteins at high-accuracy via machine-learning methods.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Michael Feig
- Department of Biochemistry and Molecular BiologyMichigan State UniversityEast LansingMichiganUSA
| |
Collapse
|
13
|
Qing R, Hao S, Smorodina E, Jin D, Zalevsky A, Zhang S. Protein Design: From the Aspect of Water Solubility and Stability. Chem Rev 2022; 122:14085-14179. [PMID: 35921495 PMCID: PMC9523718 DOI: 10.1021/acs.chemrev.1c00757] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Indexed: 12/13/2022]
Abstract
Water solubility and structural stability are key merits for proteins defined by the primary sequence and 3D-conformation. Their manipulation represents important aspects of the protein design field that relies on the accurate placement of amino acids and molecular interactions, guided by underlying physiochemical principles. Emulated designer proteins with well-defined properties both fuel the knowledge-base for more precise computational design models and are used in various biomedical and nanotechnological applications. The continuous developments in protein science, increasing computing power, new algorithms, and characterization techniques provide sophisticated toolkits for solubility design beyond guess work. In this review, we summarize recent advances in the protein design field with respect to water solubility and structural stability. After introducing fundamental design rules, we discuss the transmembrane protein solubilization and de novo transmembrane protein design. Traditional strategies to enhance protein solubility and structural stability are introduced. The designs of stable protein complexes and high-order assemblies are covered. Computational methodologies behind these endeavors, including structure prediction programs, machine learning algorithms, and specialty software dedicated to the evaluation of protein solubility and aggregation, are discussed. The findings and opportunities for Cryo-EM are presented. This review provides an overview of significant progress and prospects in accurate protein design for solubility and stability.
Collapse
Affiliation(s)
- Rui Qing
- State
Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- The
David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Shilei Hao
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- Key
Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400030, China
| | - Eva Smorodina
- Department
of Immunology, University of Oslo and Oslo
University Hospital, Oslo 0424, Norway
| | - David Jin
- Avalon GloboCare
Corp., Freehold, New Jersey 07728, United States
| | - Arthur Zalevsky
- Laboratory
of Bioinformatics Approaches in Combinatorial Chemistry and Biology, Shemyakin−Ovchinnikov Institute of Bioorganic
Chemistry RAS, Moscow 117997, Russia
| | - Shuguang Zhang
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
14
|
Pei J, Zhang J, Cong Q. Human mitochondrial protein complexes revealed by large-scale coevolution analysis and deep learning-based structure modeling. Bioinformatics 2022; 38:4301-4311. [PMID: 35881696 DOI: 10.1093/bioinformatics/btac527] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 05/27/2022] [Accepted: 07/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Recent development of deep-learning methods has led to a breakthrough in the prediction accuracy of 3D protein structures. Extending these methods to protein pairs is expected to allow large-scale detection of protein-protein interactions (PPIs) and modeling protein complexes at the proteome level. RESULTS We applied RoseTTAFold and AlphaFold, two of the latest deep-learning methods for structure predictions, to analyze coevolution of human proteins residing in mitochondria, an organelle of vital importance in many cellular processes including energy production, metabolism, cell death and antiviral response. Variations in mitochondrial proteins have been linked to a plethora of human diseases and genetic conditions. RoseTTAFold, with high computational speed, was used to predict the coevolution of about 95% of mitochondrial protein pairs. Top-ranked pairs were further subject to modeling of the complex structures by AlphaFold, which also produced contact probability with high precision and in many cases consistent with RoseTTAFold. Most top-ranked pairs with high contact probability were supported by known PPIs and/or similarities to experimental structural complexes. For high-scoring pairs without experimental complex structures, our coevolution analyses and structural models shed light on the details of their interfaces, including CHCHD4-AIFM1, MTERF3-TRUB2, FMC1-ATPAF2 and ECSIT-NDUFAF1. We also identified novel PPIs (PYURF-NDUFAF5, LYRM1-MTRF1L and COA8-COX10) for several proteins without experimentally characterized interaction partners, leading to predictions of their molecular functions and the biological processes they are involved in. AVAILABILITY AND IMPLEMENTATION Data of mitochondrial proteins and their interactions are available at: http://conglab.swmed.edu/mitochondria. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
15
|
Holm L. Dali server: structural unification of protein families. Nucleic Acids Res 2022; 50:W210-W215. [PMID: 35610055 PMCID: PMC9252788 DOI: 10.1093/nar/gkac387] [Citation(s) in RCA: 499] [Impact Index Per Article: 166.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 04/27/2022] [Accepted: 05/02/2022] [Indexed: 12/26/2022] Open
Abstract
Protein structure is key to understanding biological function. Structure comparison deciphers deep phylogenies, providing insight into functional conservation and functional shifts during evolution. Until recently, structural coverage of the protein universe was limited by the cost and labour involved in experimental structure determination. Recent breakthroughs in deep learning revolutionized structural bioinformatics by providing accurate structural models of numerous protein families for which no structural information existed. The Dali server for 3D protein structure comparison is widely used by crystallographers to relate new structures to pre-existing ones. Here, we report two most recent upgrades to the web server: (i) the foldomes of key organisms in the AlphaFold Database (version 1) are searchable by Dali, (ii) structural alignments are annotated with protein families. Using these new features, we discovered a novel functionally diverse subgroup within the WRKY/GCM1 clan. This was accomplished by linking the structurally characterized SWI/SNF and NAM families as well as the structural models of the CG-1 family and uncharacterized proteins to the structure of Gti1/Pac2, a previously known member of the WRKY/GCM1 clan. The Dali server is available at http://ekhidna2.biocenter.helsinki.fi/dali. This website is free and open to all users and there is no login requirement.
Collapse
Affiliation(s)
- Liisa Holm
- Institute of Biotechnology, Helsinki Institute of Life Sciences, and Organismal and Evolutionary Biology Research Program, Faculty of Biosciences, University of Helsinki, Finland
| |
Collapse
|
16
|
Kinch LN, Cong Q, Jaishankar J, Orth K. Co-component signal transduction systems: Fast-evolving virulence regulation cassettes discovered in enteric bacteria. Proc Natl Acad Sci U S A 2022; 119:e2203176119. [PMID: 35648808 PMCID: PMC9214523 DOI: 10.1073/pnas.2203176119] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 04/08/2022] [Indexed: 01/31/2023] Open
Abstract
Bacterial signal transduction systems sense changes in the environment and transmit these signals to control cellular responses. The simplest one-component signal transduction systems include an input sensor domain and an output response domain encoded in a single protein chain. Alternatively, two-component signal transduction systems transmit signals by phosphorelay between input and output domains from separate proteins. The membrane-tethered periplasmic bile acid sensor that activates the Vibrio parahaemolyticus type III secretion system adopts an obligate heterodimer of two proteins encoded by partially overlapping VtrA and VtrC genes. This co-component signal transduction system binds bile acid using a lipocalin-like domain in VtrC and transmits the signal through the membrane to a cytoplasmic DNA-binding transcription factor in VtrA. Using the domain and operon organization of VtrA/VtrC, we identify a fast-evolving superfamily of co-component systems in enteric bacteria. Accurate machine learning–based fold predictions for the candidate co-components support their homology in the twilight zone of rapidly evolving sequences and provide mechanistic hypotheses about previously unrecognized lipid-sensing functions.
Collapse
Affiliation(s)
- Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Jananee Jaishankar
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Kim Orth
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX 75390
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390
| |
Collapse
|
17
|
Saldaño T, Escobedo N, Marchetti J, Zea DJ, Mac Donagh J, Velez Rueda AJ, Gonik E, García Melani A, Novomisky Nechcoff J, Salas MN, Peters T, Demitroff N, Fernandez Alberti S, Palopoli N, Fornasari MS, Parisi G. Impact of protein conformational diversity on AlphaFold predictions. Bioinformatics 2022; 38:2742-2748. [PMID: 35561203 DOI: 10.1093/bioinformatics/btac202] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/10/2022] [Accepted: 03/31/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here, we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm. RESULTS Using a curated collection of apo-holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo-holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions. AVAILABILITY AND IMPLEMENTATION Data and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tadeo Saldaño
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nahuel Escobedo
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Julia Marchetti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | | | - Juan Mac Donagh
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Ana Julia Velez Rueda
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Eduardo Gonik
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- INIFTA (CONICET-UNLP) - Fotoquímica y Nanomateriales para el Ambiente y la Biología (nanoFOT), La Plata, Argentina
| | | | | | - Martín N Salas
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
| | - Tomás Peters
- Fundación Instituto Leloir-Instituto de Investigaciones Bioquímicas de Buenos Aires, Buenos Aires, Argentina
| | - Nicolás Demitroff
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Fundación Instituto Leloir-Instituto de Investigaciones Bioquímicas de Buenos Aires, Buenos Aires, Argentina
| | - Sebastian Fernandez Alberti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| |
Collapse
|
18
|
A topological data analytic approach for discovering biophysical signatures in protein dynamics. PLoS Comput Biol 2022; 18:e1010045. [PMID: 35500014 PMCID: PMC9098046 DOI: 10.1371/journal.pcbi.1010045] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 05/12/2022] [Accepted: 03/22/2022] [Indexed: 12/02/2022] Open
Abstract
Identifying structural differences among proteins can be a non-trivial task. When contrasting ensembles of protein structures obtained from molecular dynamics simulations, biologically-relevant features can be easily overshadowed by spurious fluctuations. Here, we present SINATRA Pro, a computational pipeline designed to robustly identify topological differences between two sets of protein structures. Algorithmically, SINATRA Pro works by first taking in the 3D atomic coordinates for each protein snapshot and summarizing them according to their underlying topology. Statistically significant topological features are then projected back onto a user-selected representative protein structure, thus facilitating the visual identification of biophysical signatures of different protein ensembles. We assess the ability of SINATRA Pro to detect minute conformational changes in five independent protein systems of varying complexities. In all test cases, SINATRA Pro identifies known structural features that have been validated by previous experimental and computational studies, as well as novel features that are also likely to be biologically-relevant according to the literature. These results highlight SINATRA Pro as a promising method for facilitating the non-trivial task of pattern recognition in trajectories resulting from molecular dynamics simulations, with substantially increased resolution. Structural features of proteins often serve as signatures of their biological function and molecular binding activity. Elucidating these structural features is essential for a full understanding of underlying biophysical mechanisms. While there are existing methods aimed at identifying structural differences between protein variants, such methods do not have the capability to jointly infer both geometric and dynamic changes, simultaneously. In this paper, we propose SINATRA Pro, a computational framework for extracting key structural features between two sets of proteins. SINATRA Pro robustly outperforms standard techniques in pinpointing the physical locations of both static and dynamic signatures across various types of protein ensembles, and it does so with improved resolution.
Collapse
|
19
|
Nag S, Baidya ATK, Mandal A, Mathew AT, Das B, Devi B, Kumar R. Deep learning tools for advancing drug discovery and development. 3 Biotech 2022; 12:110. [PMID: 35433167 PMCID: PMC8994527 DOI: 10.1007/s13205-022-03165-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 03/18/2022] [Indexed: 12/26/2022] Open
Abstract
A few decades ago, drug discovery and development were limited to a bunch of medicinal chemists working in a lab with enormous amount of testing, validations, and synthetic procedures, all contributing to considerable investments in time and wealth to get one drug out into the clinics. The advancements in computational techniques combined with a boom in multi-omics data led to the development of various bioinformatics/pharmacoinformatics/cheminformatics tools that have helped speed up the drug development process. But with the advent of artificial intelligence (AI), machine learning (ML) and deep learning (DL), the conventional drug discovery process has been further rationalized. Extensive biological data in the form of big data present in various databases across the globe acts as the raw materials for the ML/DL-based approaches and helps in accurate identifications of patterns and models which can be used to identify therapeutically active molecules with much fewer investments on time, workforce and wealth. In this review, we have begun by introducing the general concepts in the drug discovery pipeline, followed by an outline of the fields in the drug discovery process where ML/DL can be utilized. We have also introduced ML and DL along with their applications, various learning methods, and training models used to develop the ML/DL-based algorithms. Furthermore, we have summarized various DL-based tools existing in the public domain with their application in the drug discovery paradigm which includes DL tools for identification of drug targets and drug-target interaction such as DeepCPI, DeepDTA, WideDTA, PADME DeepAffinity, and DeepPocket. Additionally, we have discussed various DL-based models used in protein structure prediction, de novo design of new chemical scaffolds, virtual screening of chemical libraries for hit identification, absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction, metabolite prediction, clinical trial design, and oral bioavailability prediction. In the end, we have tried to shed light on some of the successful ML/DL-based models used in the drug discovery and development pipeline while also discussing the current challenges and prospects of the application of DL tools in drug discovery and development. We believe that this review will be useful for medicinal and computational chemists searching for DL tools for use in their drug discovery projects.
Collapse
Affiliation(s)
- Sagorika Nag
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Anurag T. K. Baidya
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Abhimanyu Mandal
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Alen T. Mathew
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bhanuranjan Das
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bharti Devi
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Rajnish Kumar
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| |
Collapse
|
20
|
van Breugel M, Rosa E Silva I, Andreeva A. Structural validation and assessment of AlphaFold2 predictions for centrosomal and centriolar proteins and their complexes. Commun Biol 2022; 5:312. [PMID: 35383272 PMCID: PMC8983713 DOI: 10.1038/s42003-022-03269-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 02/28/2022] [Indexed: 11/21/2022] Open
Abstract
Obtaining the high-resolution structures of proteins and their complexes is a crucial aspect of understanding the mechanisms of life. Experimental structure determination methods are time-consuming, expensive and cannot keep pace with the growing number of protein sequences available through genomic DNA sequencing. Thus, the ability to accurately predict the structure of proteins from their sequence is a holy grail of structural and computational biology that would remove a bottleneck in our efforts to understand as well as rationally engineer living systems. Recent advances in protein structure prediction, in particular the breakthrough with the AI-based tool AlphaFold2 (AF2), hold promise for achieving this goal, but the practical utility of AF2 remains to be explored. Focusing on proteins with essential roles in centrosome and centriole biogenesis, we demonstrate the quality and usability of the AF2 prediction models and we show that they can provide important insights into the modular organization of two key players in this process, CEP192 and CEP44. Furthermore, we used the AF2 algorithm to elucidate and then experimentally validate previously unknown prime features in the structure of TTBK2 bound to CEP164, as well as the Chibby1-FAM92A complex for which no structural information was available to date. These findings have important implications in understanding the regulation and function of these complexes. Finally, we also discuss some practical limitations of AF2 and anticipate the implications for future research approaches in the centriole/centrosome field.
Collapse
Affiliation(s)
- Mark van Breugel
- Queen Mary University of London, School of Biological and Behavioural Sciences, 4 Newark Street, London, E1 2AT, UK.
- Medical Research Council-Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
| | - Ivan Rosa E Silva
- Queen Mary University of London, School of Biological and Behavioural Sciences, 4 Newark Street, London, E1 2AT, UK
- Medical Research Council-Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
- University of Campinas, Faculty of Pharmaceutical Sciences, Cândido Portinari Street, Campinas, 13083-871, Brazil
| | - Antonina Andreeva
- Medical Research Council-Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| |
Collapse
|
21
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 2021; 89:1607-1617. [PMID: 34533838 PMCID: PMC8726744 DOI: 10.1002/prot.26237] [Citation(s) in RCA: 273] [Impact Index Per Article: 68.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 01/14/2023]
Abstract
Critical assessment of structure prediction (CASP) is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14), deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein-folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here, we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Maya Topf
- Centre for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universit tsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, 9600 Gudelsky Drive, Rockville, MD 20850, USA, Department of Cell Biology and Molecular Genetics, University of Maryland
| |
Collapse
|
22
|
Cragnolini T, Kryshtafovych A, Topf M. Cryo-EM targets in CASP14. Proteins 2021; 89:1949-1958. [PMID: 34398978 PMCID: PMC8630773 DOI: 10.1002/prot.26216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/27/2021] [Accepted: 08/06/2021] [Indexed: 11/22/2022]
Abstract
Structures of seven CASP14 targets were determined using cryo-electron microscopy (cryo-EM) technique with resolution between 2.1 and 3.8 Å. We provide an evaluation of the submitted models versus the experimental data (cryo-EM density maps) and experimental reference structures built into the maps. The accuracy of models is measured in terms of coordinate-to-density and coordinate-to-coordinate fit. A-posteriori refinement of the most accurate models in their corresponding cryo-EM density resulted in structures that are close to the reference structure, including some regions with better fit to the density. Regions that were found to be less "refineable" correlate well with regions of high diversity between the CASP models and low goodness-of-fit to density in the reference structure.
Collapse
Affiliation(s)
- Tristan Cragnolini
- Institute of Structural and Molecular Biology, Birkbeck, University College London, London, UK
| | | | - Maya Topf
- Center for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universitätsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| |
Collapse
|
23
|
Alexander LT, Lepore R, Kryshtafovych A, Adamopoulos A, Alahuhta M, Arvin AM, Bomble YJ, Böttcher B, Breyton C, Chiarini V, Chinnam NB, Chiu W, Fidelis K, Grinter R, Gupta GD, Hartmann MD, Hayes CS, Heidebrecht T, Ilari A, Joachimiak A, Kim Y, Linares R, Lovering AL, Lunin VV, Lupas AN, Makbul C, Michalska K, Moult J, Mukherjee PK, Nutt W(S, Oliver SL, Perrakis A, Stols L, Tainer JA, Topf M, Tsutakawa SE, Valdivia‐Delgado M, Schwede T. Target highlights in CASP14: Analysis of models by structure providers. Proteins 2021; 89:1647-1672. [PMID: 34561912 PMCID: PMC8616854 DOI: 10.1002/prot.26247] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 09/13/2021] [Accepted: 09/16/2021] [Indexed: 12/11/2022]
Abstract
The biological and functional significance of selected Critical Assessment of Techniques for Protein Structure Prediction 14 (CASP14) targets are described by the authors of the structures. The authors highlight the most relevant features of the target proteins and discuss how well these features were reproduced in the respective submitted predictions. The overall ability to predict three-dimensional structures of proteins has improved remarkably in CASP14, and many difficult targets were modeled with impressive accuracy. For the first time in the history of CASP, the experimentalists not only highlighted that computational models can accurately reproduce the most critical structural features observed in their targets, but also envisaged that models could serve as a guidance for further studies of biologically-relevant properties of proteins.
Collapse
Affiliation(s)
- Leila T. Alexander
- Biozentrum, University of BaselBaselSwitzerland
- Computational Structural BiologySIB Swiss Institute of BioinformaticsBaselSwitzerland
| | | | | | - Athanassios Adamopoulos
- Oncode Institute and Division of BiochemistryNetherlands Cancer InstituteAmsterdamThe Netherlands
| | - Markus Alahuhta
- Bioscience Center, National Renewable Energy LaboratoryGoldenColoradoUSA
| | - Ann M. Arvin
- Department of PediatricsStanford University School of MedicineStanfordCaliforniaUSA
- Microbiology and ImmunologyStanford University School of MedicineStanfordCaliforniaUSA
| | - Yannick J. Bomble
- Bioscience Center, National Renewable Energy LaboratoryGoldenColoradoUSA
| | - Bettina Böttcher
- Biocenter and Rudolf Virchow Center, Julius‐Maximilians Universität WürzburgWürzburgGermany
| | - Cécile Breyton
- Univ. Grenoble Alpes, CNRS, CEA, Institute for Structural BiologyGrenobleFrance
| | - Valerio Chiarini
- Program in Structural Biology and BiophysicsInstitute of Biotechnology, University of HelsinkiHelsinkiFinland
| | - Naga babu Chinnam
- Department of Molecular and Cellular OncologyThe University of Texas M.D. Anderson Cancer CenterHoustonTexasUSA
| | - Wah Chiu
- Microbiology and ImmunologyStanford University School of MedicineStanfordCaliforniaUSA
- BioengineeringStanford University School of MedicineStanfordCaliforniaUSA
- Division of Cryo‐EM and Bioimaging SSRLSLAC National Accelerator LaboratoryMenlo ParkCaliforniaUSA
| | | | - Rhys Grinter
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of MicrobiologyMonash UniversityClaytonAustralia
| | - Gagan D. Gupta
- Radiation Biology & Health Sciences DivisionBhabha Atomic Research CentreMumbaiIndia
| | - Marcus D. Hartmann
- Department of Protein EvolutionMax Planck Institute for Developmental BiologyTübingenGermany
| | - Christopher S. Hayes
- Department of Molecular, Cellular and Developmental BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
- Biomolecular Science and Engineering ProgramUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Tatjana Heidebrecht
- Oncode Institute and Division of BiochemistryNetherlands Cancer InstituteAmsterdamThe Netherlands
| | - Andrea Ilari
- Institute of Molecular Biology and Pathology of the National Research Council of Italy (CNR)RomeItaly
| | - Andrzej Joachimiak
- Center for Structural Genomics of Infectious Diseases, Consortium for Advanced Science and Engineering, University of ChicagoChicagoIllinoisUSA
- X‐ray Science DivisionArgonne National Laboratory, Structural Biology CenterArgonneIllinoisUSA
- Department of Biochemistry and Molecular BiologyUniversity of ChicagoChicagoIllinoisUSA
| | - Youngchang Kim
- Center for Structural Genomics of Infectious Diseases, Consortium for Advanced Science and Engineering, University of ChicagoChicagoIllinoisUSA
- X‐ray Science DivisionArgonne National Laboratory, Structural Biology CenterArgonneIllinoisUSA
| | - Romain Linares
- Univ. Grenoble Alpes, CNRS, CEA, Institute for Structural BiologyGrenobleFrance
| | | | - Vladimir V. Lunin
- Bioscience Center, National Renewable Energy LaboratoryGoldenColoradoUSA
| | - Andrei N. Lupas
- Department of Protein EvolutionMax Planck Institute for Developmental BiologyTübingenGermany
| | - Cihan Makbul
- Biocenter and Rudolf Virchow Center, Julius‐Maximilians Universität WürzburgWürzburgGermany
| | - Karolina Michalska
- Center for Structural Genomics of Infectious Diseases, Consortium for Advanced Science and Engineering, University of ChicagoChicagoIllinoisUSA
- X‐ray Science DivisionArgonne National Laboratory, Structural Biology CenterArgonneIllinoisUSA
| | - John Moult
- Department of Cell Biology and Molecular GeneticsInstitute for Bioscience and Biotechnology Research, University of MarylandRockvilleMarylandUSA
| | - Prasun K. Mukherjee
- Nuclear Agriculture & Biotechnology DivisionBhabha Atomic Research CentreMumbaiIndia
| | - William (Sam) Nutt
- Center for Structural Genomics of Infectious Diseases, Consortium for Advanced Science and Engineering, University of ChicagoChicagoIllinoisUSA
- X‐ray Science DivisionArgonne National Laboratory, Structural Biology CenterArgonneIllinoisUSA
| | - Stefan L. Oliver
- Department of PediatricsStanford University School of MedicineStanfordCaliforniaUSA
| | - Anastassis Perrakis
- Oncode Institute and Division of BiochemistryNetherlands Cancer InstituteAmsterdamThe Netherlands
| | - Lucy Stols
- Center for Structural Genomics of Infectious Diseases, Consortium for Advanced Science and Engineering, University of ChicagoChicagoIllinoisUSA
- X‐ray Science DivisionArgonne National Laboratory, Structural Biology CenterArgonneIllinoisUSA
| | - John A. Tainer
- Department of Molecular and Cellular OncologyThe University of Texas M.D. Anderson Cancer CenterHoustonTexasUSA
- Department of Cancer BiologyUniversity of Texas MD Anderson Cancer CenterHoustonTexasUSA
| | - Maya Topf
- Institute of Structural and Molecular Biology, Birkbeck, University College LondonLondonUK
- Centre for Structural Systems Biology, Leibniz‐Institut für Experimentelle VirologieHamburgGermany
| | - Susan E. Tsutakawa
- Molecular Biophysics and Integrated BioimagingLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | | | - Torsten Schwede
- Biozentrum, University of BaselBaselSwitzerland
- Computational Structural BiologySIB Swiss Institute of BioinformaticsBaselSwitzerland
| |
Collapse
|
24
|
Kryshtafovych A, Moult J, Albrecht R, Chang GA, Chao K, Fraser A, Greenfield J, Hartmann MD, Herzberg O, Josts I, Leiman PG, Linden SB, Lupas AN, Nelson DC, Rees SD, Shang X, Sokolova ML, Tidow H. Computational models in the service of X-ray and cryo-electron microscopy structure determination. Proteins 2021; 89:1633-1646. [PMID: 34449113 PMCID: PMC8616789 DOI: 10.1002/prot.26223] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 08/11/2021] [Accepted: 08/17/2021] [Indexed: 01/20/2023]
Abstract
Critical assessment of structure prediction (CASP) conducts community experiments to determine the state of the art in computing protein structure from amino acid sequence. The process relies on the experimental community providing information about not yet public or about to be solved structures, for use as targets. For some targets, the experimental structure is not solved in time for use in CASP. Calculated structure accuracy improved dramatically in this round, implying that models should now be much more useful for resolving many sorts of experimental difficulties. To test this, selected models for seven unsolved targets were provided to the experimental groups. These models were from the AlphaFold2 group, who overall submitted the most accurate predictions in CASP14. Four targets were solved with the aid of the models, and, additionally, the structure of an already solved target was improved. An a posteriori analysis showed that, in some cases, models from other groups would also be effective. This paper provides accounts of the successful application of models to structure determination, including molecular replacement for X-ray crystallography, backbone tracing and sequence positioning in a cryo-electron microscopy structure, and correction of local features. The results suggest that, in future, there will be greatly increased synergy between computational and experimental approaches to structure determination.
Collapse
Affiliation(s)
| | - John Moult
- Institute for Bioscience and Biotechnology Research, Department of Cell Biology and Molecular genetics, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Reinhard Albrecht
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Geoffrey A. Chang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California-San Diego, La Jolla, CA, 92093, USA
- Department of Pharmacology, University of California-San Diego, La Jolla, CA, 92093, USA
| | - Kinlin Chao
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
| | - Alec Fraser
- Department of Biochemistry and Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics (SCSB), The University of Texas Medical Branch at Galveston, TX 77555, USA
| | - Julia Greenfield
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
| | - Marcus D. Hartmann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Osnat Herzberg
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
- Department of Chemistry and Biochemistry, University of Maryland, College Park, MD 20742, USA
| | - Inokentijs Josts
- The Hamburg Advanced Research Center for Bioorganic Chemistry (HARBOR) & Department of Chemistry, Institute for Biochemistry and Molecular Biology, University of Hamburg, Luruper Chaussee 149, 22761 Hamburg, Germany
| | - Petr G. Leiman
- Department of Biochemistry and Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics (SCSB), The University of Texas Medical Branch at Galveston, TX 77555, USA
| | - Sara B. Linden
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
| | - Andrei N. Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Daniel C. Nelson
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
- Department of Veterinary Medicine, University of Maryland, College Park, MD 20742, USA
| | - Steven D. Rees
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California-San Diego, La Jolla, CA, 92093, USA
| | - Xiaoran Shang
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
| | - Maria L. Sokolova
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia
| | - Henning Tidow
- The Hamburg Advanced Research Center for Bioorganic Chemistry (HARBOR) & Department of Chemistry, Institute for Biochemistry and Molecular Biology, University of Hamburg, Luruper Chaussee 149, 22761 Hamburg, Germany
| | | |
Collapse
|
25
|
Ruiz-Serra V, Pontes C, Milanetti E, Kryshtafovych A, Lepore R, Valencia A. Assessing the accuracy of contact and distance predictions in CASP14. Proteins 2021; 89:1888-1900. [PMID: 34595772 DOI: 10.1002/prot.26248] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/06/2021] [Accepted: 09/21/2021] [Indexed: 12/26/2022]
Abstract
We present the results of the assessment of the intramolecular residue-residue contact and distance predictions from groups participating in the 14th round of the CASP experiment. The performance of contact prediction methods was evaluated with the measures used in previous CASPs, while distance predictions were assessed based on a new protocol, which considers individual distance pairs as well as the whole predicted distance matrix, using a graph-based framework. The results of the evaluation indicate that predictions by the tFold framework, TripletRes and DeepPotential were the most accurate in both categories. With regards to progress in method performance, the results of the assessment in contact prediction did not reveal any discernible difference when compared to CASP13. Arguably, this could be due to CASP14 FM targets being more challenging than ever before.
Collapse
Affiliation(s)
| | - Camila Pontes
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Edoardo Milanetti
- Department of Physics, Sapienza Università di Roma, Rome, Italy.,Center for Life Nano- & Neuro-Science, Fondazione Istituto Italiano di Tecnologia (IIT), Rome, Italy
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,ICREA, Pg. Lluís Companys, Barcelona, Spain
| |
Collapse
|
26
|
Cretin G, Galochkina T, de Brevern AG, Gelly JC. PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction. Int J Mol Sci 2021; 22:ijms22168831. [PMID: 34445537 PMCID: PMC8396346 DOI: 10.3390/ijms22168831] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 02/07/2023] Open
Abstract
Protein Blocks (PBs) are a widely used structural alphabet describing local protein backbone conformation in terms of 16 possible conformational states, adopted by five consecutive amino acids. The representation of complex protein 3D structures as 1D PB sequences was previously successfully applied to protein structure alignment and protein structure prediction. In the current study, we present a new model, PYTHIA (predicting any conformation at high accuracy), for the prediction of the protein local conformations in terms of PBs directly from the amino acid sequence. PYTHIA is based on a deep residual inception-inside-inception neural network with convolutional block attention modules, predicting 1 of 16 PB classes from evolutionary information combined to physicochemical properties of individual amino acids. PYTHIA clearly outperforms the LOCUSTRA reference method for all PB classes and demonstrates great performance for PB prediction on particularly challenging proteins from the CASP14 free modelling category.
Collapse
Affiliation(s)
- Gabriel Cretin
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G. de Brevern
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- Correspondence:
| |
Collapse
|
27
|
Kinch LN, Schaeffer RD, Kryshtafovych A, Grishin NV. Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins 2021; 89:1618-1632. [PMID: 34350630 DOI: 10.1002/prot.26202] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/21/2021] [Accepted: 07/11/2021] [Indexed: 12/14/2022]
Abstract
An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
28
|
Pereira J, Simpkin AJ, Hartmann MD, Rigden DJ, Keegan RM, Lupas AN. High-accuracy protein structure prediction in CASP14. Proteins 2021; 89:1687-1699. [PMID: 34218458 DOI: 10.1002/prot.26171] [Citation(s) in RCA: 206] [Impact Index Per Article: 51.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/16/2021] [Accepted: 06/23/2021] [Indexed: 12/25/2022]
Abstract
The application of state-of-the-art deep-learning approaches to the protein modeling problem has expanded the "high-accuracy" category in CASP14 to encompass all targets. Building on the metrics used for high-accuracy assessment in previous CASPs, we evaluated the performance of all groups that submitted models for at least 10 targets across all difficulty classes, and judged the usefulness of those produced by AlphaFold2 (AF2) as molecular replacement search models with AMPLE. Driven by the qualitative diversity of the targets submitted to CASP, we also introduce DipDiff as a new measure for the improvement in backbone geometry provided by a model versus available templates. Although a large leap in high-accuracy is seen due to AF2, the second-best method in CASP14 out-performed the best in CASP13, illustrating the role of community-based benchmarking in the development and evolution of the protein structure prediction field.
Collapse
Affiliation(s)
- Joana Pereira
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Adam J Simpkin
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Marcus D Hartmann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Daniel J Rigden
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Ronan M Keegan
- Department of Scientific Computing, Science and Technologies Facilities Council, UK Research and Innovation, Didcot, Oxfordshire, UK
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| |
Collapse
|