1
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
2
|
Power KM, Nguyen KC, Silva A, Singh S, Hall DH, Rongo C, Barr MM. NEKL-4 regulates microtubule stability and mitochondrial health in ciliated neurons. J Cell Biol 2024; 223:e202402006. [PMID: 38767515 PMCID: PMC11104396 DOI: 10.1083/jcb.202402006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/10/2024] [Accepted: 05/06/2024] [Indexed: 05/22/2024] Open
Abstract
Ciliopathies are often caused by defects in the ciliary microtubule core. Glutamylation is abundant in cilia, and its dysregulation may contribute to ciliopathies and neurodegeneration. Mutation of the deglutamylase CCP1 causes infantile-onset neurodegeneration. In C. elegans, ccpp-1 loss causes age-related ciliary degradation that is suppressed by a mutation in the conserved NEK10 homolog nekl-4. NEKL-4 is absent from cilia, yet it negatively regulates ciliary stability via an unknown, glutamylation-independent mechanism. We show that NEKL-4 was mitochondria-associated. Additionally, nekl-4 mutants had longer mitochondria, a higher baseline mitochondrial oxidation state, and suppressed ccpp-1∆ mutant lifespan extension in response to oxidative stress. A kinase-dead nekl-4(KD) mutant ectopically localized to ccpp-1∆ cilia and rescued degenerating microtubule doublet B-tubules. A nondegradable nekl-4(PEST∆) mutant resembled the ccpp-1∆ mutant with dye-filling defects and B-tubule breaks. The nekl-4(PEST∆) Dyf phenotype was suppressed by mutation in the depolymerizing kinesin-8 KLP-13/KIF19A. We conclude that NEKL-4 influences ciliary stability by activating ciliary kinesins and promoting mitochondrial homeostasis.
Collapse
Affiliation(s)
- Kaiden M. Power
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Ken C. Nguyen
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Andriele Silva
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, USA
| | - Shaneen Singh
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, USA
| | - David H. Hall
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Christopher Rongo
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, USA
| | - Maureen M. Barr
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| |
Collapse
|
3
|
Han Y, Lu Y, Yan X, Cui H, Cheng S, Zheng J, Zhou Y, Wang S, Li Z. Atom-ProteinQA: Atom-level protein model quality assessment through fine-grained joint learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 249:108078. [PMID: 38537495 DOI: 10.1016/j.cmpb.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/26/2023] [Accepted: 02/10/2024] [Indexed: 04/21/2024]
Abstract
MOTIVATION Protein model quality assessment (ProteinQA) is a fundamental task that is essential for biologically relevant applications, i.e., protein structure refinement, protein design, etc. Previous works aimed to conduct ProteinQA only on the global structure or per-residue level, ignoring potentially usable and precise cues from a fine-grained per-atom perspective. In this study, we propose an atom-level ProteinQA model, named Atom-ProteinQA, in which two innovative modules are designed to extract geometric and topological atom-level relationships respectively. Specifically, on the one hand, a geometric perception module exploits 3D sparse convolution to capture the geometric features of the input protein, generating fine-grained atom-level predictions. On the other hand, natural chemical bonds are utilized to construct an atom-level graph, then message passing from a topological perception module is applied to output residue-level predictions in parallel. Eventually, through a cross-model aggregation module, features from different modules mutually interact, enhancing performance on both the atom and residue levels. RESULTS Extensive experiments show that our proposed Atom-ProteinQA outperforms previous methods by a large margin, regardless of residue-level or atom-level assessment. Concretely, we achieved state-of-the-art performance on CATH-2084, Decoy-8000, public benchmarks CASP13 & CASP14, and the CAMEO. AVAILABILITY The repository of this project is released on: https://github.com/luyfcandy/Atom_ProteinQA.
Collapse
Affiliation(s)
- Yatong Han
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yingfeng Lu
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Xu Yan
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Hannah Cui
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | | | - Jiayou Zheng
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yuzhe Zhou
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China.
| | - Zhen Li
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China.
| |
Collapse
|
4
|
Da Conceição LMA, Cabral LM, Pereira GRC, De Mesquita JF. An In Silico Analysis of Genetic Variants and Structural Modeling of the Human Frataxin Protein in Friedreich's Ataxia. Int J Mol Sci 2024; 25:5796. [PMID: 38891993 PMCID: PMC11172458 DOI: 10.3390/ijms25115796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/15/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Friedreich's Ataxia (FRDA) stands out as the most prevalent form of hereditary ataxias, marked by progressive movement ataxia, loss of vibratory sensitivity, and skeletal deformities, severely affecting daily functioning. To date, the only medication available for treating FRDA is Omaveloxolone (Skyclarys®), recently approved by the FDA. Missense mutations within the human frataxin (FXN) gene, responsible for intracellular iron homeostasis regulation, are linked to FRDA development. These mutations induce FXN dysfunction, fostering mitochondrial iron accumulation and heightened oxidative stress, ultimately triggering neuronal cell death pathways. This study amalgamated 226 FXN genetic variants from the literature and database searches, with only 18 previously characterized. Predictive analyses revealed a notable prevalence of detrimental and destabilizing predictions for FXN mutations, predominantly impacting conserved residues crucial for protein function. Additionally, an accurate, comprehensive three-dimensional model of human FXN was constructed, serving as the basis for generating genetic variants I154F and W155R. These variants, selected for their severe clinical implications, underwent molecular dynamics (MD) simulations, unveiling flexibility and essential dynamic alterations in their N-terminal segments, encompassing FXN42, FXN56, and FXN78 domains pivotal for protein maturation. Thus, our findings indicate potential interaction profile disturbances in the FXN42, FXN56, and FXN78 domains induced by I154F and W155R mutations, aligning with the existing literature.
Collapse
Affiliation(s)
- Loiane Mendonça Abrantes Da Conceição
- Laboratory of Bioinformatics and Computational Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Avenida Pasteur, 296, Urca, Rio de Janeiro 22290-250, Brazil (J.F.D.M.)
| | - Lucio Mendes Cabral
- Pharmaceutical Industrial Technology Laboratory, Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho, 373, Cidade Universitária, Rio de Janeiro 21941-590, Brazil
| | - Gabriel Rodrigues Coutinho Pereira
- Pharmaceutical Industrial Technology Laboratory, Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho, 373, Cidade Universitária, Rio de Janeiro 21941-590, Brazil
- Laboratory of Molecular Modeling & QSAR, Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho, 373, Cidade Universitária, Rio de Janeiro 21941-590, Brazil
| | - Joelma Freire De Mesquita
- Laboratory of Bioinformatics and Computational Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Avenida Pasteur, 296, Urca, Rio de Janeiro 22290-250, Brazil (J.F.D.M.)
| |
Collapse
|
5
|
Martin J. AlphaFold2 Predicts Whether Proteins Interact Amidst Confounding Structural Compatibility. J Chem Inf Model 2024; 64:1473-1480. [PMID: 38373070 DOI: 10.1021/acs.jcim.3c01805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Predicting whether two proteins physically interact is one of the holy grails of computational biology, galvanized by rapid advancements in deep learning. AlphaFold2, although not developed with this goal, is promising in this respect. Here, I test the prediction capability of AlphaFold2 on a very challenging data set, where proteins are structurally compatible, even when they do not interact. AlphaFold2 achieves high discrimination between interacting and non-interacting proteins, and the cases of misclassifications can either be rescued by revisiting the input sequences or can suggest false positives and negatives in the data set. AlphaFold2 is thus not impaired by the compatibility between protein structures and has the potential to be applied on a large scale.
Collapse
Affiliation(s)
- Juliette Martin
- Univ Lyon, CNRS, UMR 5086 MMSB, 7 passage du Vercors F-69367, Lyon, France
- Laboratory of Biology and Modeling of the Cell, Ecole Normale Supérieure de Lyon, CNRS UMR 5239, Inserm U1293, University Claude Bernard Lyon 1, 69364, Lyon, France
| |
Collapse
|
6
|
Morehead A, Liu J, Cheng J. Protein structure accuracy estimation using geometry-complete perceptron networks. Protein Sci 2024; 33:e4932. [PMID: 38380738 PMCID: PMC10880424 DOI: 10.1002/pro.4932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 01/05/2024] [Accepted: 02/01/2024] [Indexed: 02/22/2024]
Abstract
Estimating the accuracy of protein structural models is a critical task in protein bioinformatics. The need for robust methods in the estimation of protein model accuracy (EMA) is prevalent in the field of protein structure prediction, where computationally-predicted structures need to be screened rapidly for the reliability of the positions predicted for each of their amino acid residues and their overall quality. Current methods proposed for EMA are either coupled tightly to existing protein structure prediction methods or evaluate protein structures without sufficiently leveraging the rich, geometric information available in such structures to guide accuracy estimation. In this work, we propose a geometric message passing neural network referred to as the geometry-complete perceptron network for protein structure EMA (GCPNet-EMA), where we demonstrate through rigorous computational benchmarks that GCPNet-EMA's accuracy estimations are 47% faster and more than 10% (6%) more correlated with ground-truth measures of per-residue (per-target) structural accuracy compared to baseline state-of-the-art methods for tertiary (multimer) structure EMA including AlphaFold 2. The source code and data for GCPNet-EMA are available on GitHub, and a public web server implementation is freely available.
Collapse
Affiliation(s)
- Alex Morehead
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| |
Collapse
|
7
|
Power KM, Nguyen KC, Silva A, Singh S, Hall DH, Rongo C, Barr MM. NEKL-4 regulates microtubule stability and mitochondrial health in C. elegans ciliated neurons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580304. [PMID: 38405845 PMCID: PMC10888866 DOI: 10.1101/2024.02.14.580304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Ciliopathies are often caused by defects in the ciliary microtubule core. Glutamylation is abundant in cilia, and its dysregulation may contribute to ciliopathies and neurodegeneration. Mutation of the deglutamylase CCP1 causes infantile-onset neurodegeneration. In C. elegans, ccpp-1 loss causes age-related ciliary degradation that is suppressed by mutation in the conserved NEK10 homolog nekl-4. NEKL-4 is absent from cilia, yet negatively regulates ciliary stability via an unknown, glutamylation-independent mechanism. We show that NEKL-4 was mitochondria-associated. nekl-4 mutants had longer mitochondria, a higher baseline mitochondrial oxidation state, and suppressed ccpp-1 mutant lifespan extension in response to oxidative stress. A kinase-dead nekl-4(KD) mutant ectopically localized to ccpp-1 cilia and rescued degenerating microtubule doublet B-tubules. A nondegradable nekl-4(PESTΔ) mutant resembled the ccpp-1 mutant with dye filling defects and B-tubule breaks. The nekl-4(PESTΔ) Dyf phenotype was suppressed by mutation in the depolymerizing kinesin-8 KLP-13/KIF19A. We conclude that NEKL-4 influences ciliary stability by activating ciliary kinesins and promoting mitochondrial homeostasis.
Collapse
Affiliation(s)
- Kaiden M Power
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, United States of America
| | - Ken C Nguyen
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Andriele Silva
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, United States of America
| | - Shaneen Singh
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, United States of America
| | - David H Hall
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Christopher Rongo
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, United States of America
| | - Maureen M Barr
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, United States of America
| |
Collapse
|
8
|
Breindl M, Spitzer D, Gerasimaitė R, Kairys V, Schubert T, Henfling R, Schwartz U, Lukinavičius G, Manelytė L. Biochemical and cellular insights into the Baz2B protein, a non-catalytic subunit of the chromatin remodeling complex. Nucleic Acids Res 2024; 52:337-354. [PMID: 38000389 PMCID: PMC10783490 DOI: 10.1093/nar/gkad1096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/21/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
Baz2B is a regulatory subunit of the ATP-dependent chromatin remodeling complexes BRF1 and BRF5, which control access to DNA during DNA-templated processes. Baz2B has been implicated in several diseases and also in unhealthy ageing, however limited information is available on the domains and cellular roles of Baz2B. To gain more insight into the Baz2B function, we biochemically characterized the TAM (Tip5/ARBP/MBD) domain with the auxiliary AT-hook motifs and the bromodomain (BRD). We observed alterations in histone code recognition in bromodomains carrying cancer-associated point mutations, suggesting their potential involvement in disease. Furthermore, the depletion of Baz2B in the Hap1 cell line resulted in altered cell morphology, reduced colony formation and perturbed transcriptional profiles. Despite that, super-resolution microscopy images revealed no changes in the overall chromatin structure in the absence of Baz2B. These findings provide insights into the biological function of Baz2B.
Collapse
Affiliation(s)
- Matthias Breindl
- Biochemistry III, University of Regensburg, Regensburg DE-93053, Germany
| | - Dominika Spitzer
- Biochemistry III, University of Regensburg, Regensburg DE-93053, Germany
| | - Rūta Gerasimaitė
- Chromatin Labeling and Imaging Group, Department of NanoBiophotonics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, DE-37077 Göttingen, Germany
| | - Visvaldas Kairys
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius LT-10257, Lithuania
| | | | - Ramona Henfling
- Biochemistry III, University of Regensburg, Regensburg DE-93053, Germany
| | - Uwe Schwartz
- NGS Analysis Center, University of Regensburg, Regensburg DE-93053, Germany
| | - Gražvydas Lukinavičius
- Chromatin Labeling and Imaging Group, Department of NanoBiophotonics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, DE-37077 Göttingen, Germany
| | - Laura Manelytė
- Biochemistry III, University of Regensburg, Regensburg DE-93053, Germany
| |
Collapse
|
9
|
Olechnovič K, Valančauskas L, Dapkūnas J, Venclovas Č. Prediction of protein assemblies by structure sampling followed by interface-focused scoring. Proteins 2023; 91:1724-1733. [PMID: 37578163 DOI: 10.1002/prot.26569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 07/12/2023] [Accepted: 07/31/2023] [Indexed: 08/15/2023]
Abstract
Proteins often function as part of permanent or transient multimeric complexes, and understanding function of these assemblies requires knowledge of their three-dimensional structures. While the ability of AlphaFold to predict structures of individual proteins with unprecedented accuracy has revolutionized structural biology, modeling structures of protein assemblies remains challenging. To address this challenge, we developed a protocol for predicting structures of protein complexes involving model sampling followed by scoring focused on the subunit-subunit interaction interface. In this protocol, we diversified AlphaFold models by varying construction and pairing of multiple sequence alignments as well as increasing the number of recycles. In cases when AlphaFold failed to assemble a full protein complex or produced unreliable results, additional diverse models were constructed by docking of monomers or subcomplexes. All the models were then scored using a newly developed method, VoroIF-jury, which relies only on structural information. Notably, VoroIF-jury is independent of AlphaFold self-assessment scores and therefore can be used to rank models originating from different structure prediction methods. We tested our protocol in CASP15 and obtained top results, significantly outperforming the standard AlphaFold-Multimer pipeline. Analysis of our results showed that the accuracy of our assembly models was capped mainly by structure sampling rather than model scoring. This observation suggests that better sampling, especially for the antibody-antigen complexes, may lead to further improvement. Our protocol is expected to be useful for modeling and/or scoring protein assemblies.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lukas Valančauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
10
|
Liu J, Liu D, He G, Zhang G. Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15. Proteins 2023; 91:1861-1870. [PMID: 37553848 DOI: 10.1002/prot.26564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/10/2023]
Abstract
This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guangxing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
11
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. Proteins 2023; 91:1889-1902. [PMID: 37357816 PMCID: PMC10749984 DOI: 10.1002/prot.26542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/27/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
12
|
Olechnovič K, Venclovas Č. VoroIF-GNN: Voronoi tessellation-derived protein-protein interface assessment using a graph neural network. Proteins 2023; 91:1879-1888. [PMID: 37482904 DOI: 10.1002/prot.26554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/19/2023] [Accepted: 07/01/2023] [Indexed: 07/25/2023]
Abstract
We present VoroIF-GNN (Voronoi InterFace Graph Neural Network), a novel method for assessing inter-subunit interfaces in a structural model of a protein-protein complex, relying solely on the input structure without any additional information. Given a multimeric protein structural model, we derive interface contacts from the Voronoi tessellation of atomic balls, construct a graph of those contacts, and predict the accuracy of every contact using an attention-based GNN. The contact-level predictions are then summarized to produce whole interface-level scores. VoroIF-GNN was blindly tested for its ability to estimate the accuracy of protein complexes during CASP15 and showed strong performance in selecting the best multimeric model out of many. The method implementation is freely available at https://kliment-olechnovic.github.io/voronota/expansion_js/.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
13
|
Liu D, Zhang B, Liu J, Li H, Song L, Zhang G. Assessing protein model quality based on deep graph coupled networks using protein language model. Brief Bioinform 2023; 25:bbad420. [PMID: 38018909 PMCID: PMC10685403 DOI: 10.1093/bib/bbad420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/19/2023] [Accepted: 10/31/2023] [Indexed: 11/30/2023] Open
Abstract
Model quality evaluation is a crucial part of protein structural biology. How to distinguish high-quality models from low-quality models, and to assess which high-quality models have relatively incorrect regions for improvement, are remain a challenge. More importantly, the quality assessment of multimer models is a hot topic for structure prediction. In this study, we propose GraphCPLMQA, a novel approach for evaluating residue-level model quality that combines graph coupled networks and embeddings from protein language models. The GraphCPLMQA consists of a graph encoding module and a transform-based convolutional decoding module. In encoding module, the underlying relational representations of sequence and high-dimensional geometry structure are extracted by protein language models with Evolutionary Scale Modeling. In decoding module, the mapping connection between structure and quality is inferred by the representations and low-dimensional features. Specifically, the triangular location and residue level contact order features are designed to enhance the association between the local structure and the overall topology. Experimental results demonstrate that GraphCPLMQA using single-sequence embedding achieves the best performance compared with the CASP15 residue-level interface evaluation methods among 9108 models in the local residue interface test set of CASP15 multimers. In CAMEO blind test (20 May 2022 to 13 August 2022), GraphCPLMQA ranked first compared with other servers (https://www.cameo3d.org/quality-estimation). GraphCPLMQA also outperforms state-of-the-art methods on 19, 035 models in CASP13 and CASP14 monomer test set.
Collapse
Affiliation(s)
- Dong Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Hui Li
- researcher of AI in the BioMap
| | - Le Song
- Chief Scientist of AI in the BioMap & MBZUAI
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
14
|
Pereira GP, Jiménez-García B, Pellarin R, Launay G, Wu S, Martin J, Souza PCT. Rational Prediction of PROTAC-Compatible Protein-Protein Interfaces by Molecular Docking. J Chem Inf Model 2023; 63:6823-6833. [PMID: 37877240 DOI: 10.1021/acs.jcim.3c01154] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
Proteolysis targeting chimeras (PROTACs) are heterobifunctional ligands that mediate the interaction between a protein target and an E3 ligase, resulting in a ternary complex, whose interaction with the ubiquitination machinery leads to target degradation. This technology is emerging as an exciting new avenue for therapeutic development, with several PROTACs currently undergoing clinical trials targeting cancer. Here, we describe a general and computationally efficient methodology combining restraint-based docking, energy-based rescoring, and a filter based on the minimal solvent-accessible surface distance to produce PROTAC-compatible PPIs suitable for when there is no a priori known PROTAC ligand. In a benchmark employing a manually curated data set of 13 ternary complex crystals, we achieved an accuracy of 92% when starting from bound structures and 77% when starting from unbound structures, respectively. Our method only requires that the ligand-bound structures of the monomeric forms of the E3 ligase and target proteins be given to run, making it general, accurate, and highly efficient, with the ability to impact early-stage PROTAC-based drug design campaigns where no structural information about the ternary complex structure is available.
Collapse
Affiliation(s)
- Gilberto P Pereira
- Molecular Microbiology and Structural Biochemistry, CNRS UMR 5086 and Université Claude Bernard Lyon 1, 7 Passage du Vercors, 69007 Lyon, France
- Laboratory of Biology and Modeling of the Cell, École Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5239 and Inserm U1293, 46 Allée d'Italie, 69007 Lyon, France
| | | | - Riccardo Pellarin
- Molecular Microbiology and Structural Biochemistry, CNRS UMR 5086 and Université Claude Bernard Lyon 1, 7 Passage du Vercors, 69007 Lyon, France
- Laboratory of Biology and Modeling of the Cell, École Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5239 and Inserm U1293, 46 Allée d'Italie, 69007 Lyon, France
| | - Guillaume Launay
- Molecular Microbiology and Structural Biochemistry, CNRS UMR 5086 and Université Claude Bernard Lyon 1, 7 Passage du Vercors, 69007 Lyon, France
- Laboratory of Biology and Modeling of the Cell, École Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5239 and Inserm U1293, 46 Allée d'Italie, 69007 Lyon, France
| | - Sangwook Wu
- PharmCADD, Busan 48792, Republic of Korea
- Department of Physics, Pukyong National University, Busan 48513, Republic of Korea
| | - Juliette Martin
- Molecular Microbiology and Structural Biochemistry, CNRS UMR 5086 and Université Claude Bernard Lyon 1, 7 Passage du Vercors, 69007 Lyon, France
- Laboratory of Biology and Modeling of the Cell, École Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5239 and Inserm U1293, 46 Allée d'Italie, 69007 Lyon, France
| | - Paulo C T Souza
- Molecular Microbiology and Structural Biochemistry, CNRS UMR 5086 and Université Claude Bernard Lyon 1, 7 Passage du Vercors, 69007 Lyon, France
- Laboratory of Biology and Modeling of the Cell, École Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5239 and Inserm U1293, 46 Allée d'Italie, 69007 Lyon, France
| |
Collapse
|
15
|
Roy S, Ben-Hur A. Protein quality assessment with a loss function designed for high-quality decoys. FRONTIERS IN BIOINFORMATICS 2023; 3:1198218. [PMID: 37915563 PMCID: PMC10616882 DOI: 10.3389/fbinf.2023.1198218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/29/2023] [Indexed: 11/03/2023] Open
Abstract
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
Collapse
Affiliation(s)
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
16
|
Liu J, Liu D, Zhang GJ. DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes. Bioinformatics 2023; 39:btad591. [PMID: 37740296 PMCID: PMC10560100 DOI: 10.1093/bioinformatics/btad591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/21/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. RESULTS Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. AVAILABILITY AND IMPLEMENTATION The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
17
|
Zhang L, Wang S, Hou J, Si D, Zhu J, Cao R. ComplexQA: a deep graph learning approach for protein complex structure assessment. Brief Bioinform 2023; 24:bbad287. [PMID: 37930021 DOI: 10.1093/bib/bbad287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/09/2023] [Accepted: 07/24/2023] [Indexed: 11/07/2023] Open
Abstract
MOTIVATION In recent years, the end-to-end deep learning method for single-chain protein structure prediction has achieved high accuracy. For example, the state-of-the-art method AlphaFold, developed by Google, has largely increased the accuracy of protein structure predictions to near experimental accuracy in some of the cases. At the same time, there are few methods that can evaluate the quality of protein complexes at the residue level. In particular, evaluating the quality of residues at the interface of protein complexes can lead to a wide range of applications, such as protein function analysis and drug design. In this paper, we introduce a new deep graph neural network-based method ComplexQA, to evaluate the local quality of interfaces for protein complexes by utilizing the residue-level structural information in 3D space and the sequence-level constraints. RESULTS We benchmark our method to other state-of-the-art quality assessment approaches on the HAF2 and DBM55-AF2 datasets (high-quality structural models predicted by AlphaFold-Multimer), and the BM5 docking dataset. The experimental results show that our proposed method achieves better or similar performance compared with other state-of-the-art methods, especially on difficult targets which only contain a few acceptable models. Our method is able to suggest a score for each interfac e residue, which demonstrates a powerful assessment tool for the ever-increasing number of protein complexes. AVAILABILITY https://github.com/Cao-Labs/ComplexQA.git. Contact: caora@plu.edu.
Collapse
Affiliation(s)
- Lei Zhang
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Sheng Wang
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint. Louis, 63103, MO, USA
| | - Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, 98011, WA, USA
| | - Junyong Zhu
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Renzhi Cao
- Department of Humanities, Pacific Lutheran University, Tacoma, 98447, WA, USA
| |
Collapse
|
18
|
Schweke H, Xu Q, Tauriello G, Pantolini L, Schwede T, Cazals F, Lhéritier A, Fernandez-Recio J, Rodríguez-Lumbreras LÁ, Schueler-Furman O, Varga JK, Jiménez-García B, Réau MF, Bonvin A, Savojardo C, Martelli PL, Casadio R, Tubiana J, Wolfson H, Oliva R, Barradas-Bautista D, Ricciardelli T, Cavallo L, Venclovas Č, Olechnovič K, Guerois R, Andreani J, Martin J, Wang X, Kihara D, Marchand A, Correia B, Zou X, Dey S, Dunbrack R, Levy E, Wodak S. Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study. Proteomics 2023; 23:e2200323. [PMID: 37365936 PMCID: PMC10937251 DOI: 10.1002/pmic.202200323] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/28/2023]
Abstract
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Julia K. Varga
- Hebrew University of Jerusalem Institute for Medical Research Israel-Canada
| | | | | | | | | | | | | | - Jérôme Tubiana
- Tel Aviv University Blavatnik School of Computer Science
| | - Haim Wolfson
- Tel Aviv University Blavatnik School of Computer Science
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, Institute for Data Science and Informatics, University of Missouri
| | | | | | | | | |
Collapse
|
19
|
Wu F, Wu L, Radev D, Xu J, Li SZ. Integration of pre-trained protein language models into geometric deep learning networks. Commun Biol 2023; 6:876. [PMID: 37626165 PMCID: PMC10457366 DOI: 10.1038/s42003-023-05133-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/11/2023] [Indexed: 08/27/2023] Open
Abstract
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
Collapse
Affiliation(s)
- Fang Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Lirong Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA
| | - Jinbo Xu
- Institute of AI Industry Research, Tsinghua University, Haidian Street, 100084, Beijing, China
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Stan Z Li
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China.
| |
Collapse
|
20
|
Mahtha SK, Kumari K, Gaur V, Yadav G. Cavity architecture based modulation of ligand binding tunnels in plant START domains. Comput Struct Biotechnol J 2023; 21:3946-3963. [PMID: 37635766 PMCID: PMC10448341 DOI: 10.1016/j.csbj.2023.07.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2023] Open
Abstract
The Steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domain represents an evolutionarily conserved superfamily of lipid transfer proteins widely distributed across the tree of life. Despite significant expansion in plants, knowledge about this domain remains inadequate in plants. In this work, we explore the role of cavity architectural modulations in START protein evolution and functional diversity. We use deep-learning approaches to generate plant START domain models, followed by surface accessibility studies and a comprehensive structural investigation of the rice START family. We validate 28 rice START domain models, delineate binding cavities, measure pocket volumes, and compare these with mammalian counterparts to understand evolution of binding preferences. Overall, plant START domains retain the ancestral α/β helix-grip signature, but we find subtle variation in cavity architectures, resulting in significantly smaller ligand-binding tunnels in the plant kingdom. We identify cavity lining residues (CLRs) responsible for reduction in ancestral tunnel space, and these appear to be class specific, and unique to plants, providing a mechanism for the observed shift in domain function. For instance, mammalian cavity lining residues A135, G181 and A192 have evolved to larger CLRs across the plant kingdom, contributing to smaller sizes, minimal STARTs being the largest, while members of type-IV HD-Zip family show almost complete obliteration of lipid binding cavities, consistent with their present-day DNA binding functions. In summary, this work quantifies plant START structural & functional divergence, bridging current knowledge gaps.
Collapse
Affiliation(s)
| | - Kamlesh Kumari
- National Institute of Plant Genome Research, New Delhi 110067, India
| | - Vineet Gaur
- National Institute of Plant Genome Research, New Delhi 110067, India
| | - Gitanjali Yadav
- National Institute of Plant Genome Research, New Delhi 110067, India
| |
Collapse
|
21
|
Dirvelyte E, Bujanauskiene D, Jankaityte E, Daugelaviciene N, Kisieliute U, Nagula I, Budvytyte R, Neniskyte U. Genetically encoded phosphatidylserine biosensor for in vitro, ex vivo and in vivo labelling. Cell Mol Biol Lett 2023; 28:59. [PMID: 37501184 PMCID: PMC10373266 DOI: 10.1186/s11658-023-00472-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/27/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND The dynamics of phosphatidylserine in the plasma membrane is a tightly regulated feature of eukaryotic cells. Phosphatidylserine (PS) is found preferentially in the inner leaflet of the plasma membrane. Disruption of this asymmetry leads to the exposure of phosphatidylserine on the cell surface and is associated with cell death, synaptic pruning, blood clotting and other cellular processes. Due to the role of phosphatidylserine in widespread cellular functions, an efficient phosphatidylserine probe is needed to study them. Currently, a few different phosphatidylserine labelling tools are available; however, these labels have unfavourable signal-to-noise ratios and are difficult to use in tissues due to limited permeability. Their application in living tissue requires injection procedures that damage the tissue and release damage-associated molecular patterns, which in turn stimulates phosphatidylserine exposure. METHODS For this reason, we developed a novel genetically encoded phosphatidylserine probe based on the C2 domain of the lactadherin (MFG-E8) protein, suitable for labelling exposed phosphatidylserine in various research models. We tested the C2 probe specificity to phosphatidylserine on hybrid bilayer lipid membranes by observing surface plasmon resonance angle shift. Then, we analysed purified fused C2 proteins on different cell culture lines or engineered AAVs encoding C2 probes on tissue cultures after apoptosis induction. For in vivo experiments, neurotropic AAVs were intravenously injected into perinatal mice, and after 2 weeks, brain slices were collected to observe C2-SNAP expression. RESULTS The biophysical analysis revealed the high specificity of the C2 probe for phosphatidylserine. The fused recombinant C2 proteins were suitable for labelling phosphatidylserine on the surface of apoptotic cells in various cell lines. We engineered AAVs and validated them in organotypic brain tissue cultures for non-invasive delivery of the genetically encoded C2 probe and showed that these probes were expressed in the brain in vivo after intravenous AAV delivery to mice. CONCLUSIONS We have demonstrated that the developed genetically encoded PS biosensor can be utilised in a variety of assays as a two-component system of C2 and C2m2 fusion proteins. This system allows for precise quantification and PS visualisation at directly specified threshold levels, enabling the evaluation of PS exposure in both physiological and cell death processes.
Collapse
Affiliation(s)
- Eimina Dirvelyte
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daina Bujanauskiene
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
- Institute of Bioscience, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Evelina Jankaityte
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
- Institute of Biochemistry, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Neringa Daugelaviciene
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Ugne Kisieliute
- Institute of Bioscience, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Igor Nagula
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Rima Budvytyte
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
- Institute of Biochemistry, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Urte Neniskyte
- VU LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
- Institute of Bioscience, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
| |
Collapse
|
22
|
He G, Liu J, Liu D, Zhang G. GraphGPSM: a global scoring model for protein structure using graph neural networks. Brief Bioinform 2023:bbad219. [PMID: 37317619 DOI: 10.1093/bib/bbad219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/14/2023] [Accepted: 05/22/2023] [Indexed: 06/16/2023] Open
Abstract
The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.
Collapse
Affiliation(s)
- Guangxing He
- College of Information Engineering, Zhejiang University of Technology
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
23
|
Wozny MR, Di Luca A, Morado DR, Picco A, Khaddaj R, Campomanes P, Ivanović L, Hoffmann PC, Miller EA, Vanni S, Kukulski W. In situ architecture of the ER-mitochondria encounter structure. Nature 2023:10.1038/s41586-023-06050-3. [PMID: 37165187 DOI: 10.1038/s41586-023-06050-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 04/04/2023] [Indexed: 05/12/2023]
Abstract
The endoplasmic reticulum and mitochondria are main hubs of eukaryotic membrane biogenesis that rely on lipid exchange via membrane contact sites1-3, but the underpinning mechanisms remain poorly understood. In yeast, tethering and lipid transfer between the two organelles is mediated by the endoplasmic reticulum-mitochondria encounter structure (ERMES), a four-subunit complex of unresolved stoichiometry and architecture4-6. Here we determined the molecular organization of ERMES within Saccharomyces cerevisiae cells using integrative structural biology by combining quantitative live imaging, cryo-correlative microscopy, subtomogram averaging and molecular modelling. We found that ERMES assembles into approximately 25 discrete bridge-like complexes distributed irregularly across a contact site. Each bridge consists of three synaptotagmin-like mitochondrial lipid binding protein domains oriented in a zig-zag arrangement. Our molecular model of ERMES reveals a pathway for lipids. These findings resolve the in situ supramolecular architecture of a major inter-organelle lipid transfer machinery and provide a basis for the mechanistic understanding of lipid fluxes in eukaryotic cells.
Collapse
Affiliation(s)
- Michael R Wozny
- MRC Laboratory of Molecular Biology, Cambridge, UK
- Department of Anatomy and Cell Biology, McGill University, Montreal, Quebec, Canada
| | - Andrea Di Luca
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Dustin R Morado
- MRC Laboratory of Molecular Biology, Cambridge, UK
- SciLifeLab, Solna, Sweden
- Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Andrea Picco
- Department of Biochemistry, University of Geneva, Geneva, Switzerland
| | - Rasha Khaddaj
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bern, Switzerland
| | - Pablo Campomanes
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Lazar Ivanović
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bern, Switzerland
- Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern, Switzerland
| | - Patrick C Hoffmann
- MRC Laboratory of Molecular Biology, Cambridge, UK
- Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | | | - Stefano Vanni
- Department of Biology, University of Fribourg, Fribourg, Switzerland.
| | - Wanda Kukulski
- MRC Laboratory of Molecular Biology, Cambridge, UK.
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bern, Switzerland.
| |
Collapse
|
24
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
25
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531814. [PMID: 36945536 PMCID: PMC10028888 DOI: 10.1101/2023.03.08.531814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and ranked first out of 24 predictors in estimating the global accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analayzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA. The source code of MULTICOM_qa is available at https://github.com/BioinfoMachineLearning/MULTICOM_qa .
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
26
|
Korlyukov AA, Stash AI, Romanenko AR, Trzybiński D, Woźniak K, Vologzhanina AV. Ligand-Receptor Interactions of Lamivudine: A View from Charge Density Study and QM/MM Calculations. Biomedicines 2023; 11:biomedicines11030743. [PMID: 36979722 PMCID: PMC10045540 DOI: 10.3390/biomedicines11030743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 03/06/2023] Open
Abstract
The nature and strength of interactions for an anti-HIV drug, Lamivudine, were studied in a pure crystal form of the drug and the ligand–receptor complexes. High-resolution single-crystal X-ray diffraction studies of the tetragonal polymorph allowed the drug’s experimental charge density distribution in the solid state to be obtained. The QM/MM calculations were performed for a simplified model of the Lamivudine complex with deoxycytidine kinase (two complexes with different binding modes) to reconstruct the theoretical charge density distribution. The peculiarities of intramolecular interactions were compared with previously reported data for an isolated molecule. Intermolecular interactions were revealed within the quantum theory of ‘Atoms in Molecules’, and their contributions to the total crystal energy or ligand–receptor binding energy were evaluated. It was demonstrated that the crystal field effect weakened the intramolecular interactions. Overall, the energies of intermolecular interactions in ligand–receptor complexes (320.1–394.8 kJ/mol) were higher than the energies of interactions in the crystal (276.9 kJ/mol) due to the larger number of hydrophilic interactions. In contrast, the sum of the energies of hydrophobic interactions was found to be unchanged. It was demonstrated by means of the Voronoi tessellation that molecular volume remained constant for different molecular conformations (250(13) Å3) and increased up to 399 Å3 and 521(30) Å3 for the Lamivudine phosphate and triphosphate.
Collapse
Affiliation(s)
- Alexander A. Korlyukov
- A. N. Nesmeyanov Institute of Organoelement Compounds, Russian Academy of Sciences, 28 Vavilov St., Moscow 19334, Russia
| | - Adam. I. Stash
- A. N. Nesmeyanov Institute of Organoelement Compounds, Russian Academy of Sciences, 28 Vavilov St., Moscow 19334, Russia
| | - Alexander R. Romanenko
- A. N. Nesmeyanov Institute of Organoelement Compounds, Russian Academy of Sciences, 28 Vavilov St., Moscow 19334, Russia
| | - Damian Trzybiński
- Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warszawa, Poland
| | - Krzysztof Woźniak
- Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warszawa, Poland
| | - Anna V. Vologzhanina
- A. N. Nesmeyanov Institute of Organoelement Compounds, Russian Academy of Sciences, 28 Vavilov St., Moscow 19334, Russia
- Correspondence:
| |
Collapse
|
27
|
Bouqdayr M, Abbad A, Baba H, Saih A, Wakrim L, Kettani A. Computational analysis of structural and functional evaluation of the deleterious missense variants in the human CTLA4 gene. J Biomol Struct Dyn 2023; 41:14179-14196. [PMID: 36764830 DOI: 10.1080/07391102.2023.2178509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 02/04/2023] [Indexed: 02/12/2023]
Abstract
CTLA-4 is an immune checkpoint receptor that negatively regulates the T-cell function expressed after T-cell activation to break the immune response. The current study predicted the genomic analysis to explore the functional variations of missense SNPs in the human CTLA4 gene using PolyPhen2, SIFT, PANTHER, PROVEAN, Fathmm, Mutation Assessor, PhD-SNP, SNPs&GO, SNAP2, and MutPred2. Phylogenetic conservation protein was predicted by ConSurf. Protein structural analysis was carried out by I-Mutant3, MUpro, iStable2, PremPS, and ERIS servers. Molecular dynamics trajectory analysis (RMSD, RMSF, Rg, SASA, H-bonds, and PCA) was performed to analyze the dynamic behavior of native and mutant CTLA-4 at the atomic level. Our in-silico analysis suggested that C58S, G118R, P137Q, P137R, P137L, P138T, and G146L variants were predicted to be the most deleterious missense variants and highly conserved residues. Moreover, the molecular dynamics analysis proposed a decrease in the protein stability and compactness with the P137R and P138T highlighting the impact of these variants on the function of the CTLA-4 protein.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Meryem Bouqdayr
- Laboratory of Biology and Health, Faculty of Sciences Ben M'sick, Hassan II University of Casablanca, Casablanca, Morocco
- Virology Unit, Immunovirology Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Anass Abbad
- Medical Virology and BSL-3 Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Hanâ Baba
- Laboratory of Biology and Health, Faculty of Sciences Ben M'sick, Hassan II University of Casablanca, Casablanca, Morocco
- Virology Unit, Immunovirology Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Asmae Saih
- Laboratory of Biology and Health, Faculty of Sciences Ben M'sick, Hassan II University of Casablanca, Casablanca, Morocco
- Virology Unit, Immunovirology Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Lahcen Wakrim
- Virology Unit, Immunovirology Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Anass Kettani
- Laboratory of Biology and Health, Faculty of Sciences Ben M'sick, Hassan II University of Casablanca, Casablanca, Morocco
| |
Collapse
|
28
|
Urbelienė N, Tiškus M, Tamulaitienė G, Gasparavičiūtė R, Lapinskaitė R, Jauniškis V, Sūdžius J, Meškienė R, Tauraitė D, Skrodenytė E, Urbelis G, Vaitekūnas J, Meškys R. Cytidine deaminases catalyze the conversion of N( S, O) 4-substituted pyrimidine nucleosides. SCIENCE ADVANCES 2023; 9:eade4361. [PMID: 36735785 PMCID: PMC9897663 DOI: 10.1126/sciadv.ade4361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 01/03/2023] [Indexed: 06/18/2023]
Abstract
Cytidine deaminases (CDAs) catalyze the hydrolytic deamination of cytidine and 2'-deoxycytidine to uridine and 2'-deoxyuridine. Here, we report that prokaryotic homo-tetrameric CDAs catalyze the nucleophilic substitution at the fourth position of N4-acyl-cytidines, N4-alkyl-cytidines, and N4-alkyloxycarbonyl-cytidines, and S4-alkylthio-uridines and O4-alkyl-uridines, converting them to uridine and corresponding amide, amine, carbamate, thiol, or alcohol as leaving groups. The x-ray structure of a metagenomic CDA_F14 and the molecular modeling of the CDAs used in this study show a relationship between the bulkiness of a leaving group and the volume of the binding pocket, which is partly determined by the flexible β3α3 loop of CDAs. We propose that CDAs that are active toward a wide range of substrates participate in salvage and/or catabolism of variously modified pyrimidine nucleosides. This identified promiscuity of CDAs expands the knowledge about the cellular turnover of cytidine derivatives, including the pharmacokinetics of pyrimidine-based prodrugs.
Collapse
Affiliation(s)
- Nina Urbelienė
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Matas Tiškus
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Giedrė Tamulaitienė
- Department of Protein–DNA Interactions, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, 10257 Vilnius, Lithuania
| | - Renata Gasparavičiūtė
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Ringailė Lapinskaitė
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
- Department of Organic Chemistry, Center for Physical Sciences and Technology, Akademijos 7, LT-08412 Vilnius, Lithuania
| | - Vykintas Jauniškis
- UAB Biomatter Designs (Biomatter), Žirmūnų st. 139A, 09120 Vilnius, Lithuania
| | - Jurgis Sūdžius
- Department of Organic Chemistry, Center for Physical Sciences and Technology, Akademijos 7, LT-08412 Vilnius, Lithuania
| | - Rita Meškienė
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Daiva Tauraitė
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Emilija Skrodenytė
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Gintaras Urbelis
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
- Department of Organic Chemistry, Center for Physical Sciences and Technology, Akademijos 7, LT-08412 Vilnius, Lithuania
| | - Justas Vaitekūnas
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| | - Rolandas Meškys
- Department of Molecular Microbiology and Biotechnology, Institute of Biochemistry, Life Sciences Center, Vilnius University, Saulėtekio av., 10257 Vilnius, Lithuania
| |
Collapse
|
29
|
Liu J, Zhao K, Zhang G. Improved model quality assessment using sequence and structural information by enhanced deep neural networks. Brief Bioinform 2023; 24:6865134. [PMID: 36460624 DOI: 10.1093/bib/bbac507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/02/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
30
|
Arguelles J, Lee J, Cardenas LV, Govind S, Singh S. In Silico Analysis of a Drosophila Parasitoid Venom Peptide Reveals Prevalence of the Cation-Polar-Cation Clip Motif in Knottin Proteins. Pathogens 2023; 12:pathogens12010143. [PMID: 36678491 PMCID: PMC9865768 DOI: 10.3390/pathogens12010143] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open
Abstract
As generalist parasitoid wasps, Leptopilina heterotoma are highly successful on many species of fruit flies of the genus Drosophila. The parasitoids produce specialized multi-strategy extracellular vesicle (EV)-like structures in their venom. Proteomic analysis identified several immunity-associated proteins, including the knottin peptide, LhKNOT, containing the structurally conserved inhibitor cysteine knot (ICK) fold, which is present in proteins from diverse taxa. Our structural and docking analysis of LhKNOT's 36-residue core knottin fold revealed that in addition to the knottin motif itself, it also possesses a Cation-Polar-Cation (CPC) clip. The CPC clip motif is thought to facilitate antimicrobial activity in heparin-binding proteins. Surprisingly, a majority of ICKs tested also possess the CPC clip motif, including 75 bona fide plant and arthropod knottin proteins that share high sequence and/or structural similarity with LhKNOT. Like LhKNOT and these other 75 knottin proteins, even the Drosophila Drosomycin antifungal peptide, a canonical target gene of the fly's Toll-NF-kappa B immune pathway, contains this CPC clip motif. Together, our results suggest a possible defensive function for the parasitoid LhKNOT. The prevalence of the CPC clip motif, intrinsic to the cysteine knot within the knottin proteins examined here, suggests that the resultant 3D topology is important for their biochemical functions. The CPC clip is likely a highly conserved structural motif found in many diverse proteins with reported heparin binding capacity, including amyloid proteins. Knottins are targets for therapeutic drug development, and insights into their structure-function relationships will advance novel drug design.
Collapse
Affiliation(s)
- Joseph Arguelles
- Department of Biology, Brooklyn College, Brooklyn, NY 11210, USA
| | - Jenny Lee
- Department of Biology, Brooklyn College, Brooklyn, NY 11210, USA
| | - Lady V. Cardenas
- Department of Biology, The City College of New York, New York, NY 10031, USA
| | - Shubha Govind
- Department of Biology, The City College of New York, New York, NY 10031, USA
- PhD Program in Biochemistry, The Graduate Center of the City University of New York, New York, NY 10016, USA
- PhD Program in Biology, The Graduate Center of the City University of New York, New York, NY 10016, USA
| | - Shaneen Singh
- Department of Biology, Brooklyn College, Brooklyn, NY 11210, USA
- PhD Program in Biochemistry, The Graduate Center of the City University of New York, New York, NY 10016, USA
- PhD Program in Biology, The Graduate Center of the City University of New York, New York, NY 10016, USA
- Correspondence:
| |
Collapse
|
31
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
32
|
Chen C, Chen X, Morehead A, Wu T, Cheng J. 3D-equivariant graph neural networks for protein model quality assessment. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:6986970. [PMID: 36637199 PMCID: PMC10089647 DOI: 10.1093/bioinformatics/btad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/28/2022] [Accepted: 01/12/2023] [Indexed: 01/14/2023]
Abstract
MOTIVATION Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. RESULTS We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method-AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method-AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/BioinfoMachineLearning/EnQA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
33
|
Guedeney N, Cornu M, Schwalen F, Kieffer C, Voisin-Chiret AS. PROTAC technology: A new drug design for chemical biology with many challenges in drug discovery. Drug Discov Today 2023; 28:103395. [PMID: 36228895 DOI: 10.1016/j.drudis.2022.103395] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/06/2022] [Accepted: 10/05/2022] [Indexed: 11/06/2022]
Abstract
Target Protein Degradation TPD is a new avenue and revolutionary for therapeutics because redefining the principles of classical drug discovery and guided by event-based target activity rather than the occupancy-driven activity. Since the discovery of the first PROTAC in 2001, TPD represents a rapidly growing technology, with applications in both drug discovery and chemical biology. Over the last decade, many questions have been raised and today the knowledge gained by each team has elucidated a number of them, although there is still a long way to go. The objective of this work is to present the challenges that the PROTAC strategy has very recently addressed in drug design and discovery by presenting extremely recent results from the literature and to provide guidelines in the drug design of new PROTACs as successful therapeutic modality for medicinal chemists.
Collapse
Affiliation(s)
| | - Marie Cornu
- Normandie Univ, UNICAEN, CERMN, 14000 Caen, France
| | - Florian Schwalen
- Normandie Univ, UNICAEN, CERMN, 14000 Caen, France; Department of Pharmacy, Caen University Hospital, Caen 14000, France
| | | | | |
Collapse
|
34
|
San A, Palmieri D, Saxena A, Singh S. In silico study predicts a key role of RNA-binding domains 3 and 4 in nucleolin-miRNA interactions. Proteins 2022; 90:1837-1850. [PMID: 35514080 DOI: 10.1002/prot.26355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 04/07/2022] [Accepted: 04/26/2022] [Indexed: 12/17/2023]
Abstract
RNA binding proteins (RBPs) regulate many important cellular processes through their interactions with RNA molecules. RBPs are critical for posttranscriptional mechanisms keeping gene regulation in a fine equilibrium. Conversely, dysregulation of RBPs and RNA metabolism pathways is an established hallmark of tumorigenesis. Human nucleolin (NCL) is a multifunctional RBP that interacts with different types of RNA molecules, in part through its four RNA binding domains (RBDs). Particularly, NCL interacts directly with microRNAs (miRNAs) and is involved in their aberrant processing linked with many cancers, including breast cancer. Nonetheless, molecular details of the NCL-miRNA interaction remain obscure. In this study, we used an in silico approach to characterize how NCL targets miRNAs and whether this specificity is imposed by a definite RBD-interface. Here, we present structural models of NCL-RBDs and miRNAs, as well as predict scenarios of NCL-miRNA interactions generated using docking algorithms. Our study suggests a predominant role of NCL RBDs 3 and 4 (RBD3-4) in miRNA binding. We provide detailed analyses of specific motifs/residues at the NCL-substrate interface in both these RBDs and miRNAs. Finally, we propose that the evolutionary emergence of more than two RBDs in NCL in higher organisms coincides with its additional role/s in miRNA processing. Our study shows that RBD3-4 display sequence/structural determinants to specifically recognize miRNA precursor molecules. Moreover, the insights from this study can ultimately support the design of novel antineoplastic drugs aimed at regulating NCL-dependent biological pathways with a causal role in tumorigenesis.
Collapse
Affiliation(s)
- Avdar San
- Department of Biology, Brooklyn College, The City University of New York, Brooklyn, New York, USA
- The Biochemistry PhD Program, The Graduate Center of the City University of New York, New York, New York, USA
| | - Dario Palmieri
- Department of Cancer Biology and Genetics, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Anjana Saxena
- Department of Biology, Brooklyn College, The City University of New York, Brooklyn, New York, USA
- The Biochemistry PhD Program, The Graduate Center of the City University of New York, New York, New York, USA
| | - Shaneen Singh
- Department of Biology, Brooklyn College, The City University of New York, Brooklyn, New York, USA
- The Biochemistry PhD Program, The Graduate Center of the City University of New York, New York, New York, USA
| |
Collapse
|
35
|
Kaushik R, Zhang KY. An Integrated Protein Structure Fitness Scoring Approach for Identifying Native-Like Model Structures. Comput Struct Biotechnol J 2022; 20:6467-6472. [DOI: 10.1016/j.csbj.2022.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/14/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
|
36
|
Weng G, Cai X, Cao D, Du H, Shen C, Deng Y, He Q, Yang B, Li D, Hou T. PROTAC-DB 2.0: an updated database of PROTACs. Nucleic Acids Res 2022; 51:D1367-D1372. [PMID: 36300631 PMCID: PMC9825472 DOI: 10.1093/nar/gkac946] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/23/2022] [Accepted: 10/11/2022] [Indexed: 01/30/2023] Open
Abstract
Proteolysis targeting chimeras (PROTACs), which harness the ubiquitin-proteasome system to selectively induce targeted protein degradation, represent an emerging therapeutic technology with the potential to modulate traditional undruggable targets. Over the past few years, this technology has moved from academia to industry and more than 10 PROTACs have been advanced into clinical trials. However, designing potent PROTACs with desirable drug-like properties still remains a great challenge. Here, we report an updated online database, PROTAC-DB 2.0, which is a repository of structural and experimental data about PROTACs. In this 2nd release, we expanded the number of PROTACs to 3270, which corresponds to a 96% expansion over the first version. Meanwhile, the numbers of warheads (small molecules targeting the proteins of interest), linkers, and E3 ligands (small molecules recruiting E3 ligases) have increased to over 360, 1500 and 80, respectively. In addition, given the importance and the limited number of the crystal target-PROTAC-E3 ternary complex structures, we provide the predicted ternary complex structures for PROTACs with good degradation capability using our PROTAC-Model method. To further facilitate the analysis of PROTAC data, a new filtering strategy based on the E3 ligases is also added. PROTAC-DB 2.0 is available online at http://cadd.zju.edu.cn/protacdb/.
Collapse
Affiliation(s)
| | | | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Qiaojun He
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China,Center for Drug Safety Evaluation and Research, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Bo Yang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Correspondence may also be addressed to Dan Li.
| | - Tingjun Hou
- To whom correspondence should be addressed. Tel: +86 517 8820 8412;
| |
Collapse
|
37
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
38
|
Aderinwale T, Christoffer C, Kihara D. RL-MLZerD: Multimeric protein docking using reinforcement learning. Front Mol Biosci 2022; 9:969394. [PMID: 36090027 PMCID: PMC9459051 DOI: 10.3389/fmolb.2022.969394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/08/2022] [Indexed: 11/24/2022] Open
Abstract
Numerous biological processes in a cell are carried out by protein complexes. To understand the molecular mechanisms of such processes, it is crucial to know the quaternary structures of the complexes. Although the structures of protein complexes have been determined by biophysical experiments at a rapid pace, there are still many important complex structures that are yet to be determined. To supplement experimental structure determination of complexes, many computational protein docking methods have been developed; however, most of these docking methods are designed only for docking with two chains. Here, we introduce a novel method, RL-MLZerD, which builds multiple protein complexes using reinforcement learning (RL). In RL-MLZerD a multi-chain assembly process is considered as a series of episodes of selecting and integrating pre-computed pairwise docking models in a RL framework. RL is effective in correctly selecting plausible pairwise models that fit well with other subunits in a complex. When tested on a benchmark dataset of protein complexes with three to five chains, RL-MLZerD showed better modeling performance than other existing multiple docking methods under different evaluation criteria, except against AlphaFold-Multimer in unbound docking. Also, it emerged that the docking order of multi-chain complexes can be naturally predicted by examining preferred paths of episodes in the RL computation.
Collapse
Affiliation(s)
- Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
- *Correspondence: Daisuke Kihara,
| |
Collapse
|
39
|
Bitton M, Keasar C. Estimation of model accuracy by a unique set of features and tree-based regressor. Sci Rep 2022; 12:14074. [PMID: 35982086 PMCID: PMC9388490 DOI: 10.1038/s41598-022-17097-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 07/20/2022] [Indexed: 11/26/2022] Open
Abstract
Computationally generated models of protein structures bridge the gap between the practically negligible price tag of sequencing and the high cost of experimental structure determination. By providing a low-cost (and often free) partial alternative to experimentally determined structures, these models help biologists design and interpret their experiments. Obviously, the more accurate the models the more useful they are. However, methods for protein structure prediction generate many structural models of various qualities, necessitating means for the estimation of their accuracy. In this work we present MESHI_consensus, a new method for the estimation of model accuracy. The method uses a tree-based regressor and a set of structural, target-based, and consensus-based features. The new method achieved high performance in the EMA (Estimation of Model Accuracy) track of the recent CASP14 community-wide experiment (https://predictioncenter.org/casp14/index.cgi). The tertiary structure prediction track of that experiment revealed an unprecedented leap in prediction performance by a single prediction group/method, namely AlphaFold2. This achievement would inevitably have a profound impact on the field of protein structure prediction, including the accuracy estimation sub-task. We conclude this manuscript with some speculations regarding the future role of accuracy estimation in a new era of accurate protein structure prediction.
Collapse
Affiliation(s)
- Mor Bitton
- Department of Computer Science, Ben Gurion University, Be'er Sheva, Israel.
| | - Chen Keasar
- Department of Computer Science, Ben Gurion University, Be'er Sheva, Israel.
| |
Collapse
|
40
|
Turzo SMBA, Seffernick JT, Rolland AD, Donor MT, Heinze S, Prell JS, Wysocki VH, Lindert S. Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction. Nat Commun 2022; 13:4377. [PMID: 35902583 PMCID: PMC9334640 DOI: 10.1038/s41467-022-32075-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 07/14/2022] [Indexed: 11/09/2022] Open
Abstract
Ion mobility (IM) mass spectrometry provides structural information about protein shape and size in the form of an orientationally-averaged collision cross-section (CCSIM). While IM data have been used with various computational methods, they have not yet been utilized to predict monomeric protein structure from sequence. Here, we show that IM data can significantly improve protein structure determination using the modelling suite Rosetta. We develop the Rosetta Projection Approximation using Rough Circular Shapes (PARCS) algorithm that allows for fast and accurate prediction of CCSIM from structure. Following successful testing of the PARCS algorithm, we use an integrative modelling approach to utilize IM data for protein structure prediction. Additionally, we propose a confidence metric that identifies near native models in the absence of a known structure. The results of this study demonstrate the ability of IM data to consistently improve protein structure prediction. Collision cross sections (CCS) from ion mobility mass spectrometry provide information about protein shape and size. Here, the authors develop an algorithm to predict CCS and integrate experimental ion mobility data into Rosetta-based molecular modelling to predict protein structures from sequence.
Collapse
Affiliation(s)
- S M Bargeen Alam Turzo
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - Amber D Rolland
- Department of Chemistry and Biochemistry and Materials Science Institute, University of Oregon, Eugene, OR, 97403, USA
| | - Micah T Donor
- Department of Chemistry and Biochemistry and Materials Science Institute, University of Oregon, Eugene, OR, 97403, USA
| | - Sten Heinze
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - James S Prell
- Department of Chemistry and Biochemistry and Materials Science Institute, University of Oregon, Eugene, OR, 97403, USA
| | - Vicki H Wysocki
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry and Resource for Native Mass Spectrometry Guided Structural Biology, Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
41
|
Chen X, Cheng J. DISTEMA: distance map-based estimation of single protein model accuracy with attentive 2D convolutional neural network. BMC Bioinformatics 2022; 23:141. [PMID: 35439931 PMCID: PMC9019949 DOI: 10.1186/s12859-022-04683-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Estimation of the accuracy (quality) of protein structural models is important for both prediction and use of protein structural models. Deep learning methods have been used to integrate protein structure features to predict the quality of protein models. Inter-residue distances are key information for predicting protein's tertiary structures and therefore have good potentials to predict the quality of protein structural models. However, few methods have been developed to fully take advantage of predicted inter-residue distance maps to estimate the accuracy of a single protein structural model. RESULT We developed an attentive 2D convolutional neural network (CNN) with channel-wise attention to take only a raw difference map between the inter-residue distance map calculated from a single protein model and the distance map predicted from the protein sequence as input to predict the quality of the model. The network comprises multiple convolutional layers, batch normalization layers, dense layers, and Squeeze-and-Excitation blocks with attention to automatically extract features relevant to protein model quality from the raw input without using any expert-curated features. We evaluated DISTEMA's capability of selecting the best models for CASP13 targets in terms of ranking loss of GDT-TS score. The ranking loss of DISTEMA is 0.079, lower than several state-of-the-art single-model quality assessment methods. CONCLUSION This work demonstrates that using raw inter-residue distance information with deep learning can predict the quality of protein structural models reasonably well. DISTEMA is freely at https://github.com/jianlin-cheng/DISTEMA.
Collapse
Affiliation(s)
- Xiao Chen
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri Columbia, Columbia, MO 65211 USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri Columbia, Columbia, MO 65211 USA
| |
Collapse
|
42
|
Guo SS, Liu J, Zhou XG, Zhang GJ. DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning. Bioinformatics 2022; 38:1895-1903. [PMID: 35134108 DOI: 10.1093/bioinformatics/btac056] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/26/2021] [Accepted: 01/27/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Protein model quality assessment is a key component of protein structure prediction. In recent research, the voxelization feature was used to characterize the local structural information of residues, but it may be insufficient for describing residue-level topological information. Design features that can further reflect residue-level topology when combined with deep learning methods are therefore crucial to improve the performance of model quality assessment. RESULTS We developed a deep-learning method, DeepUMQA, based on Ultrafast Shape Recognition (USR) for the residue-level single-model quality assessment. In the framework of the deep residual neural network, the residue-level USR feature was introduced to describe the topological relationship between the residue and overall structure by calculating the first moment of a set of residue distance sets and then combined with 1D, 2D and voxelization features to assess the quality of the model. Experimental results on the CASP13, CASP14 test datasets and CAMEO blind test show that USR could supplement the voxelization features to comprehensively characterize residue structure information and significantly improve model assessment accuracy. The performance of DeepUMQA ranks among the top during the state-of-the-art single-model quality assessment methods, including ProQ2, ProQ3, ProQ3D, Ornate, VoroMQA, ProteinGCN, ResNetQA, QDeep, GraphQA, ModFOLD6, ModFOLD7, ModFOLD8, QMEAN3, QMEANDisCo3 and DeepAccNet. AVAILABILITY AND IMPLEMENTATION The DeepUMQA server is freely available at http://zhanglab-bioinf.com/DeepUMQA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sai-Sai Guo
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
43
|
Donor Splice Site Variant in SLC9A6 Causes Christianson Syndrome in a Lithuanian Family: A Case Report. MEDICINA (KAUNAS, LITHUANIA) 2022; 58:medicina58030351. [PMID: 35334527 PMCID: PMC8949093 DOI: 10.3390/medicina58030351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/17/2022] [Accepted: 02/24/2022] [Indexed: 11/16/2022]
Abstract
Background and Objectives: The pathogenic variants of SLC9A6 are a known cause of a rare, X-linked neurological disorder called Christianson syndrome (CS). The main characteristics of CS are developmental delay, intellectual disability, and neurological findings. This study investigated the genetic basis and explored the molecular changes that led to CS in two male siblings presenting with intellectual disability, epilepsy, behavioural problems, gastrointestinal dysfunction, poor height, and weight gain. Materials and Methods: Next-generation sequencing of a tetrad was applied to identify the DNA changes and Sanger sequencing of proband’s cDNA was used to evaluate the impact of a splice site variant on mRNA structure. Bioinformatical tools were used to investigate SLC9A6 protein structure changes. Results: Sequencing and bioinformatical analysis revealed a novel donor splice site variant (NC_000023.11(NM_001042537.1):c.899 + 1G > A) that leads to a frameshift and a premature stop codon. Protein structure modelling showed that the truncated protein is unlikely to form any functionally relevant SLC9A6 dimers. Conclusions: Molecular and bioinformatical analysis revealed the impact of a novel donor splice site variant in the SLC9A6 gene that leads to truncated and functionally disrupted protein causing the phenotype of CS in the affected individuals.
Collapse
|
44
|
Philip J, Örd M, Silva A, Singh S, Diffley JFX, Remus D, Loog M, Ikui AE. Cdc6 is sequentially regulated by PP2A-Cdc55, Cdc14, and Sic1 for origin licensing in S. cerevisiae. eLife 2022; 11:e74437. [PMID: 35142288 PMCID: PMC8830886 DOI: 10.7554/elife.74437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 12/15/2021] [Indexed: 01/31/2023] Open
Abstract
Cdc6, a subunit of the pre-replicative complex (pre-RC), contains multiple regulatory cyclin-dependent kinase (Cdk1) consensus sites, SP or TP motifs. In Saccharomyces cerevisiae, Cdk1 phosphorylates Cdc6-T7 to recruit Cks1, the Cdk1 phospho-adaptor in S phase, for subsequent multisite phosphorylation and protein degradation. Cdc6 accumulates in mitosis and is tightly bound by Clb2 through N-terminal phosphorylation in order to prevent premature origin licensing and degradation. It has been extensively studied how Cdc6 phosphorylation is regulated by the cyclin-Cdk1 complex. However, a detailed mechanism on how Cdc6 phosphorylation is reversed by phosphatases has not been elucidated. Here, we show that PP2ACdc55 dephosphorylates Cdc6 N-terminal sites to release Clb2. Cdc14 dephosphorylates the C-terminal phospho-degron, leading to Cdc6 stabilization in mitosis. In addition, Cdk1 inhibitor Sic1 releases Clb2·Cdk1·Cks1 from Cdc6 to load Mcm2-7 on the chromatin upon mitotic exit. Thus, pre-RC assembly and origin licensing are promoted by phosphatases through the attenuation of distinct Cdk1-dependent Cdc6 inhibitory mechanisms.
Collapse
Affiliation(s)
- Jasmin Philip
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| | | | - Andriele Silva
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| | - Shaneen Singh
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| | | | - Dirk Remus
- Memorial Sloan-Kettering Cancer CenterNew YorkUnited States
| | | | - Amy E Ikui
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| |
Collapse
|
45
|
Prediction and Modeling of Protein–Protein Interactions Using “Spotted” Peptides with a Template-Based Approach. Biomolecules 2022; 12:biom12020201. [PMID: 35204702 PMCID: PMC8961654 DOI: 10.3390/biom12020201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/20/2022] [Accepted: 01/22/2022] [Indexed: 12/10/2022] Open
Abstract
Protein–peptide interactions (PpIs) are a subset of the overall protein–protein interaction (PPI) network in the living cell and are pivotal for the majority of cell processes and functions. High-throughput methods to detect PpIs and PPIs usually require time and costs that are not always affordable. Therefore, reliable in silico predictions represent a valid and effective alternative. In this work, a new algorithm is described, implemented in a freely available tool, i.e., “PepThreader”, to carry out PPIs and PpIs prediction and analysis. PepThreader threads multiple fragments derived from a full-length protein sequence (or from a peptide library) onto a second template peptide, in complex with a protein target, “spotting” the potential binding peptides and ranking them according to a sequence-based and structure-based threading score. The threading algorithm first makes use of a scoring function that is based on peptides sequence similarity. Then, a rerank of the initial hits is performed, according to structure-based scoring functions. PepThreader has been benchmarked on a dataset of 292 protein–peptide complexes that were collected from existing databases of experimentally determined protein–peptide interactions. An accuracy of 80%, when considering the top predicted 25 hits, was achieved, which performs in a comparable way with the other state-of-art tools in PPIs and PpIs modeling. Nonetheless, PepThreader is unique in that it is able at the same time to spot a binding peptide within a full-length sequence involved in PPI and model its structure within the receptor. Therefore, PepThreader adds to the already-available tools supporting the experimental PPIs and PpIs identification and characterization.
Collapse
|
46
|
rsRNASP: A residue-separation-based statistical potential for RNA 3D structure evaluation. Biophys J 2022; 121:142-156. [PMID: 34798137 PMCID: PMC8758408 DOI: 10.1016/j.bpj.2021.11.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/23/2021] [Accepted: 11/10/2021] [Indexed: 01/07/2023] Open
Abstract
Knowledge-based statistical potentials have been shown to be rather effective in protein 3-dimensional (3D) structure evaluation and prediction. Recently, several statistical potentials have been developed for RNA 3D structure evaluation, while their performances are either still at a low level for the test datasets from structure prediction models or dependent on the "black-box" process through neural networks. In this work, we have developed an all-atom distance-dependent statistical potential based on residue separation for RNA 3D structure evaluation, namely rsRNASP, which is composed of short- and long-ranged potentials distinguished by residue separation. The extensive examinations against available RNA test datasets show that rsRNASP has apparently higher performance than the existing statistical potentials for the realistic test datasets with large RNAs from structure prediction models, including the newly released RNA-Puzzles dataset, and is comparable to the existing top statistical potentials for the test datasets with small RNAs or near-native decoys. In addition, rsRNASP is superior to RNA3DCNN, a recently developed scoring function through 3D convolutional neural networks. rsRNASP and the relevant databases are available to the public.
Collapse
|
47
|
Kaushik R, Zhang KYJ. ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures. Bioinformatics 2022; 38:369-376. [PMID: 34542606 DOI: 10.1093/bioinformatics/btab666] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 09/06/2021] [Accepted: 09/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins. RESULTS The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman's and Pearson's correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design. AVAILABILITY AND IMPLEMENTATION http://github.com/KYZ-LSB/ProTerS-FitFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rahul Kaushik
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
48
|
El-Jaick KB, Ribeiro-Alves M, Soares MVG, Araujo GEFD, Pereira GRC, Rolla VC, Mesquita JFD, De Castro L. Homozygotes NAT2*5B slow acetylators are highly associated with hepatotoxicity induced by anti-tuberculosis drugs. Mem Inst Oswaldo Cruz 2022; 117:e210328. [PMID: 35588539 PMCID: PMC9049236 DOI: 10.1590/0074-02760210328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 02/24/2022] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Distinct N-acetyltransferase 2 (NAT2) slow acetylators genotypes have been associated with a higher risk to develop anti-tuberculosis drug-induced hepatotoxicity (DIH). However, studies have not pointed the relevance of different acetylation phenotypes presented by homozygotes and compound heterozygotes slow acetylators on a clinical basis. OBJECTIVES This study aimed to investigate the association between NAT2 genotypes and the risk of developing DIH in Brazilian patients undergoing tuberculosis treatment, focusing on the discrimination of homozygotes and compound heterozygotes slow acetylators. METHODS/FINDINGS The frequency of NAT2 genotypes was analysed by DNA sequencing in 162 patients undergoing tuberculosis therapy. The mutation analyses revealed 15 variants, plus two new NAT2 mutations, that computational simulations predicted to cause structural perturbations in the protein. The multivariate statistical analysis revealed that carriers of NAT2*5/*5 slow acetylator genotype presented a higher risk of developing anti-tuberculosis DIH, on a clinical basis, when compared to the compound heterozygotes presenting NAT2*5 and any other slow acetylator haplotype [aOR 4.97, 95% confidence interval (CI) 1.47-16.82, p = 0.01]. CONCLUSION These findings suggest that patients with TB diagnosis who present the NAT2*5B/*5B genotype should be properly identified and more carefully monitored until treatment outcome in order to prevent the occurrence of anti-tuberculosis DIH.
Collapse
Affiliation(s)
- Kenia Balbi El-Jaick
- Universidade Federal do Estado do Rio de Janeiro, Brazil; Universidade Federal do Estado do Rio de Janeiro, Brazil
| | | | | | | | | | | | - Joelma Freire De Mesquita
- Universidade Federal do Estado do Rio de Janeiro, Brazil; Universidade Federal do Estado do Rio de Janeiro, Brazil; Universidade Federal do Estado do Rio de Janeiro, Brazil
| | | |
Collapse
|
49
|
Lensink MF, Brysbaert G, Mauri T, Nadzirin N, Velankar S, Chaleil RAG, Clarence T, Bates PA, Kong R, Liu B, Yang G, Liu M, Shi H, Lu X, Chang S, Roy RS, Quadir F, Liu J, Cheng J, Antoniak A, Czaplewski C, Giełdoń A, Kogut M, Lipska AG, Liwo A, Lubecka EA, Maszota-Zieleniak M, Sieradzan AK, Ślusarz R, Wesołowski PA, Zięba K, Del Carpio Muñoz CA, Ichiishi E, Harmalkar A, Gray JJ, Bonvin AMJJ, Ambrosetti F, Vargas Honorato R, Jandova Z, Jiménez-García B, Koukos PI, Van Keulen S, Van Noort CW, Réau M, Roel-Touris J, Kotelnikov S, Padhorny D, Porter KA, Alekseenko A, Ignatov M, Desta I, Ashizawa R, Sun Z, Ghani U, Hashemi N, Vajda S, Kozakov D, Rosell M, Rodríguez-Lumbreras LA, Fernandez-Recio J, Karczynska A, Grudinin S, Yan Y, Li H, Lin P, Huang SY, Christoffer C, Terashi G, Verburgt J, Sarkar D, Aderinwale T, Wang X, Kihara D, Nakamura T, Hanazono Y, Gowthaman R, Guest JD, Yin R, Taherzadeh G, Pierce BG, Barradas-Bautista D, Cao Z, Cavallo L, Oliva R, Sun Y, Zhu S, Shen Y, Park T, Woo H, Yang J, Kwon S, Won J, Seok C, Kiyota Y, Kobayashi S, Harada Y, Takeda-Shitaka M, Kundrotas PJ, Singh A, Vakser IA, Dapkūnas J, Olechnovič K, Venclovas Č, Duan R, Qiu L, Xu X, Zhang S, Zou X, Wodak SJ. Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment. Proteins 2021; 89:1800-1823. [PMID: 34453465 PMCID: PMC8616814 DOI: 10.1002/prot.26222] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 12/19/2022]
Abstract
We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70-75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70-80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.
Collapse
Affiliation(s)
- Marc F Lensink
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Guillaume Brysbaert
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Théo Mauri
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Nurul Nadzirin
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | | | - Tereza Clarence
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Bin Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Guangbo Yang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ming Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xufeng Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | - Anna Antoniak
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Artur Giełdoń
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Mateusz Kogut
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Emilia A Lubecka
- Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland
| | | | | | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Patryk A Wesołowski
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Karolina Zięba
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Eiichiro Ichiishi
- International University of Health and Welfare Hospital (IUHW Hospital), Nasushiobara City, Japan
| | - Ameya Harmalkar
- Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Jeffrey J Gray
- Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Alexandre M J J Bonvin
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Francesco Ambrosetti
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Rodrigo Vargas Honorato
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Zuzana Jandova
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Brian Jiménez-García
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Panagiotis I Koukos
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Siri Van Keulen
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Charlotte W Van Noort
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Manon Réau
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Jorge Roel-Touris
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
- Innopolis University, Russia
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Kathryn A Porter
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Andrey Alekseenko
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
- Institute of Computer-Aided Design of the Russian Academy of Sciences, Moscow, Russia
| | - Mikhail Ignatov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Ryota Ashizawa
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Zhuyezi Sun
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Usman Ghani
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Nasser Hashemi
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Mireia Rosell
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Luis A Rodríguez-Lumbreras
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Juan Fernandez-Recio
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Sergei Grudinin
- Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Daipayan Sarkar
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tsukasa Nakamura
- Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan
| | - Yuya Hanazono
- Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Tokai, Ibaraki, Japan
| | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Johnathan D Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Ghazaleh Taherzadeh
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | | | - Zhen Cao
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Luigi Cavallo
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Romina Oliva
- University of Naples "Parthenope", Napoli, Italy
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Taeyong Park
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hyeonuk Woo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jinsol Yang
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Yasuomi Kiyota
- School of Pharmacy, Kitasato University, Minato-ku, Tokyo, Japan
| | | | - Yoshiki Harada
- School of Pharmacy, Kitasato University, Minato-ku, Tokyo, Japan
| | | | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Amar Singh
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Rui Duan
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Liming Qiu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Xianjin Xu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Shuang Zhang
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Xiaoqin Zou
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
- Department of Biochemistry, University of Missouri, Columbia, Missouri, USA
| | | |
Collapse
|
50
|
Wang W, Wang J, Li Z, Xu D, Shang Y. MUfoldQA_G: High-accuracy protein model QA via retraining and transformation. Comput Struct Biotechnol J 2021; 19:6282-6290. [PMID: 34900138 PMCID: PMC8636996 DOI: 10.1016/j.csbj.2021.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/10/2021] [Accepted: 11/14/2021] [Indexed: 11/21/2022] Open
Abstract
Protein tertiary structure prediction is an active research area and has attracted significant attention recently due to the success of AlphaFold from DeepMind. Methods capable of accurately evaluating the quality of predicted models are of great importance. In the past, although many model quality assessment (QA) methods have been developed, their accuracies are not consistently high across different QA performance metrics for diverse target proteins. In this paper, we propose MUfoldQA_G, a new multi-model QA method that aims at simultaneously optimizing Pearson correlation and average GDT-TS difference, two commonly used QA performance metrics. This method is based on two new algorithms MUfoldQA_Gp and MUfoldQA_Gr. MUfoldQA_Gp uses a new technique to combine information from protein templates and reference protein models to maximize the Pearson correlation QA metric. MUfoldQA_Gr employs a new machine learning technique that resamples training data and retrains adaptively to learn a consensus model that is better than naïve consensus while minimizing average GDT-TS difference. MUfoldQA_G uses a new method to combine the results of MUfoldQA_Gr and MUfoldQA_Gp so that the final QA prediction results achieve low average GDT-TS difference that is close to the results from MUfoldQA_Gr, while maintaining high Pearson correlation that is the same as the results from MUfoldQA_Gp. In CASP14 QA categories, MUfoldQA_G ranked No. 1 in Pearson correlation and No. 2 in average GDT-TS difference.
Collapse
Affiliation(s)
- Wenbo Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Junlin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Zhaoyu Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yi Shang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|