1
|
Li J, Chen X, Huang H, Zeng M, Yu J, Gong X, Ye Q. $\mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm. Brief Bioinform 2025; 26:bbaf120. [PMID: 40163822 PMCID: PMC11957296 DOI: 10.1093/bib/bbaf120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 01/23/2025] [Accepted: 02/23/2025] [Indexed: 04/02/2025] Open
Abstract
Protein pre-training has emerged as a transformative approach for solving diverse biological tasks. While many contemporary methods focus on sequence-based language models, recent findings highlight that protein sequences alone are insufficient to capture the extensive information inherent in protein structures. Recognizing the crucial role of protein structure in defining function and interactions, we introduce $\mathcal{S}$able, a versatile pre-training model designed to comprehensively understand protein structures. $\mathcal{S}$able incorporates a novel structural encoding mechanism that enhances inter-atomic information exchange and spatial awareness, combined with robust pre-training strategies and lightweight decoders optimized for specific downstream tasks. This approach enables $\mathcal{S}$able to consistently outperform existing methods in tasks such as generation, classification, and regression, demonstrating its superior capability in protein structure representation. The code and models can be accessed via GitHub repository at https://github.com/baaihealth/Sable.
Collapse
Affiliation(s)
- Jiashan Li
- Institute for Mathematical Sciences, Renmin University of China, 59 Zhongguancun Street, Beijing 100872, China
| | - Xi Chen
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - He Huang
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - Mingliang Zeng
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - Jingcheng Yu
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, 59 Zhongguancun Street, Beijing 100872, China
| | - Qiwei Ye
- Bio Computing Center, Beijing Academy of Artificial Intelligence, 150 Chengfu Road, Beijing 100084, China
| |
Collapse
|
2
|
Nandigrami P, Goldman ID, Fiser A. Mechanistic insights into mutation in the proton-coupled folate transporter (SLC46A1) causing hereditary folate malabsorption. J Biol Chem 2025; 301:108280. [PMID: 39924111 PMCID: PMC11929075 DOI: 10.1016/j.jbc.2025.108280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Revised: 01/17/2025] [Accepted: 01/31/2025] [Indexed: 02/11/2025] Open
Abstract
Hereditary folate malabsorption (HFM) is a rare, autosomal recessive disorder characterized by impaired intestinal absorption and impaired transport of folates across the choroid plexus into cerebral spinal fluid due to inactivating mutations in the human proton-coupled folate transporter (hPCFT) gene, which encodes the proton-coupled folate transporter (PCFT) SLC46A1. Understanding the structural impact of these mutations is crucial for elucidating the mechanistic basis for PCFT function and the pathophysiology of HFM. Recently, the cryo-electron microscopic structural characterization of the Gallus gallus PCFT was obtained, which shares significant sequence identity with hPCFT. We conducted molecular dynamics simulations of hPCFT based on this structure, to explore structural changes induced by functionally defective disease-causing and other mutant proteins and mutations that restore function. Simulations revealed that the mutually mechanistic basis for the loss of function is partial loss of structural integrity of hPCFT primarily manifested in an enlarged and distorted pore accompanied by loss of long-range contacts, less stable, fluctuating inner helices with reduced solvent accessibility, and a marked loss of ordered secondary structures. These changes are reversed by the introduction of compensatory mutations. These findings provide novel insights into the structural and functional consequences of PCFT mutations associated with HFM and provide correlations with kinetic and biochemical properties of the mutant proteins.
Collapse
Affiliation(s)
- Prithviraj Nandigrami
- Departments of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA; Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, USA
| | - I David Goldman
- Departments of Medicine, Oncology and Molecular Pharmacology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Andras Fiser
- Departments of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA; Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, USA.
| |
Collapse
|
3
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
4
|
Matinja AI, Kamarudin NHA, Leow ATC, Oslan SN, Ali MSM. Structural Insights into Cold-Active Lipase from Glaciozyma antarctica PI12: Alphafold2 Prediction and Molecular Dynamics Simulation. J Mol Evol 2024; 92:944-963. [PMID: 39549052 DOI: 10.1007/s00239-024-10219-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 11/06/2024] [Indexed: 11/18/2024]
Abstract
Cold-active enzymes have recently gained popularity because of their high activity at lower temperatures than their mesophilic and thermophilic counterparts, enabling them to withstand harsh reaction conditions and enhance industrial processes. Cold-active lipases are enzymes produced by psychrophiles that live and thrive in extremely cold conditions. Cold-active lipase applications are now growing in the detergency, synthesis of fine chemicals, food processing, bioremediation, and pharmaceutical industries. The cold adaptation mechanisms exhibited by these enzymes are yet to be fully understood. Using phylogenetic analysis, and advanced deep learning-based protein structure prediction tool Alphafold2, we identified an evolutionary processes in which a conserved cold-active-like motif is presence in a distinct subclade of the tree and further predicted and simulated the three-dimensional structure of a putative cold-active lipase with the cold active motif, Glalip03, from Glaciozyma antarctica PI12. Molecular dynamics at low temperatures have revealed global stability over a wide range of temperatures, flexibility, and the ability to cope with changes in water and solvent entropy. Therefore, the knowledge we uncover here will be crucial for future research into how these low-temperature-adapted enzymes maintain their overall flexibility and function at lower temperatures.
Collapse
Affiliation(s)
- Adamu Idris Matinja
- Enzyme and Microbial Technology Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia
- Department of Biochemistry, Faculty of Science, Bauchi State University, Gadau, 751105, Nigeria
| | - Nor Hafizah Ahmad Kamarudin
- Enzyme and Microbial Technology Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia
- Centre of Foundation Studies for Agricultural Science, Universiti Putra Malaysia, 43400, Serdang, Malaysia
| | - Adam Thean Chor Leow
- Enzyme and Microbial Technology Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400, Serdang, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Malaysia
| | - Siti Nurbaya Oslan
- Enzyme and Microbial Technology Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400, Serdang, Malaysia
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia
| | - Mohd Shukuri Mohamad Ali
- Enzyme and Microbial Technology Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia.
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400, Serdang, Malaysia.
- Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, Serdang, Malaysia.
| |
Collapse
|
5
|
McGuffin LJ, Alharbi SMA. ModFOLD9: A Web Server for Independent Estimates of 3D Protein Model Quality. J Mol Biol 2024; 436:168531. [PMID: 39237204 DOI: 10.1016/j.jmb.2024.168531] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 02/19/2024] [Accepted: 03/06/2024] [Indexed: 09/07/2024]
Abstract
Accurate models of protein tertiary structures are now available from numerous advanced prediction methods, although the accuracy of each method often varies depending on the specific protein target. Additionally, many models may still contain significant local errors. Therefore, reliable, independent model quality estimates are essential both for identifying errors and selecting the very best models for further biological investigations. ModFOLD9 is a leading independent server for detecting the local errors in models produced by any method, and it can accurately discriminate between high-quality models from multiple alternative approaches. ModFOLD9 incorporates several new scores from deep learning-based approaches, leading to greatly improved prediction accuracy compared with earlier versions of the server. ModFOLD9 is continuously independently benchmarked, and it is shown to be highly competitive with other public servers. ModFOLD9 is freely available at https://www.reading.ac.uk/bioinf/ModFOLD/.
Collapse
|
6
|
Liu J, Liu D, He G, Zhang G. Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15. Proteins 2023; 91:1861-1870. [PMID: 37553848 DOI: 10.1002/prot.26564] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/10/2023]
Abstract
This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guangxing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
7
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
8
|
Roy S, Ben-Hur A. Protein quality assessment with a loss function designed for high-quality decoys. FRONTIERS IN BIOINFORMATICS 2023; 3:1198218. [PMID: 37915563 PMCID: PMC10616882 DOI: 10.3389/fbinf.2023.1198218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/29/2023] [Indexed: 11/03/2023] Open
Abstract
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
Collapse
Affiliation(s)
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
9
|
Liu J, Liu D, Zhang GJ. DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes. Bioinformatics 2023; 39:btad591. [PMID: 37740296 PMCID: PMC10560100 DOI: 10.1093/bioinformatics/btad591] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/21/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. RESULTS Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. AVAILABILITY AND IMPLEMENTATION The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
10
|
Perea‐Cabrera M, Granados‐Riveron JT, Segura‐Stanford B, Moreno‐Vargas LM, Prada‐Gracia D, Moran‐Espinosa MC, Erdmenger J, Diaz‐Garcia H, Sánchez‐Urbina R. Opitz GBBB syndrome with total anomalous pulmonary venous connection: A new MID1 gene variant. Mol Genet Genomic Med 2023; 11:e2234. [PMID: 37498300 PMCID: PMC10496055 DOI: 10.1002/mgg3.2234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 06/12/2023] [Accepted: 06/28/2023] [Indexed: 07/28/2023] Open
Abstract
BACKGROUND Opitz GBBB syndrome (GBBB) is an X-linked disease characterized by midline defects, including congenital heart defects. We present our diagnostic approach to the identification of GBBB in a consanguineous family in which two males siblings were concordant for a total anomalous connection of pulmonary veins and minor facial dysmorphias. METHODS Targeted exome sequencing analysis of a 380-gene panel associated with cardiovascular disease was performed on the propositus. Interpretative analysis of the exome results was conducted, and 3D models of the protein changes were generated. RESULTS We identified a NM_000381.4:c.608G>A;p.(Arg203Gln) change in MID1, affecting the conformation of the B-box 2 domain of the protein, with a zinc finger structure and associated protein interactions. This clinical phenotype is consistent with GBBB; however, the type of congenital heart disease observed in this case has not been previously reported. CONCLUSION A new likely pathogenic variant on MID1 c.608G>A was found to be associated with Opitz GBBB syndrome.
Collapse
Affiliation(s)
- Maryangel Perea‐Cabrera
- Centro de Investigación en Malformaciones CongénitasHospital Infantil de México Federico GómezMexico CityMexico
| | - Javier T. Granados‐Riveron
- Centro de Investigación en Malformaciones CongénitasHospital Infantil de México Federico GómezMexico CityMexico
| | | | - Liliana M. Moreno‐Vargas
- Unidad de Investigación en Biología Computacional y Diseño de FármacosHospital Infantil de México Federico GómezCiudad de MéxicoMexico
| | - Diego Prada‐Gracia
- Unidad de Investigación en Biología Computacional y Diseño de FármacosHospital Infantil de México Federico GómezCiudad de MéxicoMexico
| | - Mari C. Moran‐Espinosa
- Centro de Investigación en Malformaciones CongénitasHospital Infantil de México Federico GómezMexico CityMexico
| | - Julio Erdmenger
- Departamento de CardiologíaHospital Infantil de México Federico GómezMexico CityMexico
| | - Hector Diaz‐Garcia
- Centro de Investigación en Malformaciones CongénitasHospital Infantil de México Federico GómezMexico CityMexico
| | - Rocío Sánchez‐Urbina
- Centro de Investigación en Malformaciones CongénitasHospital Infantil de México Federico GómezMexico CityMexico
- Escuela Superior de Medicina del Instituto Politécnico NacionalMexico CityMexico
| |
Collapse
|
11
|
Wu F, Wu L, Radev D, Xu J, Li SZ. Integration of pre-trained protein language models into geometric deep learning networks. Commun Biol 2023; 6:876. [PMID: 37626165 PMCID: PMC10457366 DOI: 10.1038/s42003-023-05133-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/11/2023] [Indexed: 08/27/2023] Open
Abstract
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
Collapse
Affiliation(s)
- Fang Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Lirong Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA
| | - Jinbo Xu
- Institute of AI Industry Research, Tsinghua University, Haidian Street, 100084, Beijing, China
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Stan Z Li
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China.
| |
Collapse
|
12
|
Gao Z, Jiang W, Zhang Y, Zhang L, Yi M, Wang H, Ma Z, Qu B, Ji X, Long H, Zhang S. Amphioxus adenosine-to-inosine tRNA-editing enzyme that can perform C-to-U and A-to-I deamination of DNA. Commun Biol 2023; 6:744. [PMID: 37464027 PMCID: PMC10354150 DOI: 10.1038/s42003-023-05134-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 07/11/2023] [Indexed: 07/20/2023] Open
Abstract
Adenosine-to-inosine tRNA-editing enzyme has been identified for more than two decades, but the study on its DNA editing activity is rather scarce. We show that amphioxus (Branchiostoma japonicum) ADAT2 (BjADAT2) contains the active site 'HxE-PCxxC' and the key residues for target-base-binding, and amphioxus ADAT3 (BjADAT3) harbors both the N-terminal positively charged region and the C-terminal pseudo-catalytic domain important for recognition of substrates. The sequencing of BjADAT2-transformed Escherichia coli genome suggests that BjADAT2 has the potential to target E. coli DNA and can deaminate at TCG and GAA sites in the E. coli genome. Biochemical analyses further demonstrate that BjADAT2, in complex with BjADAT3, can perform A-to-I editing of tRNA and convert C-to-U and A-to-I deamination of DNA. We also show that BjADAT2 preferentially deaminates adenosines and cytidines in the loop of DNA hairpin structures of substrates, and BjADAT3 also affects the type of DNA substrate targeted by BjADAT2. Finally, we find that C89, N113, C148 and Y156 play critical roles in the DNA editing activity of BjADAT2. Collectively, our study indicates that BjADAT2/3 is the sole naturally occurring deaminase with both tRNA and DNA editing capacity identified so far in Metazoa.
Collapse
Affiliation(s)
- Zhan Gao
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China.
| | - Wanyue Jiang
- Institute of Evolution & Marine Biodiversity, KLMME, Ocean University of China, 266003, Qingdao, China
| | - Yu Zhang
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Liping Zhang
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Mengmeng Yi
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Haitao Wang
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Zengyu Ma
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Baozhen Qu
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Xiaohan Ji
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China
| | - Hongan Long
- Institute of Evolution & Marine Biodiversity, KLMME, Ocean University of China, 266003, Qingdao, China
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, 266237, Qingdao, China
| | - Shicui Zhang
- Institute of Evolution & Marine Biodiversity and Department of Marine Biology, Ocean University of China, 266003, Qingdao, China.
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, 266237, Qingdao, China.
| |
Collapse
|
13
|
He G, Liu J, Liu D, Zhang G. GraphGPSM: a global scoring model for protein structure using graph neural networks. Brief Bioinform 2023:bbad219. [PMID: 37317619 DOI: 10.1093/bib/bbad219] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/14/2023] [Accepted: 05/22/2023] [Indexed: 06/16/2023] Open
Abstract
The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.
Collapse
Affiliation(s)
- Guangxing He
- College of Information Engineering, Zhejiang University of Technology
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
14
|
Deng L, Ly C, Abdollahi S, Zhao Y, Prinz I, Bonn S. Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency. Front Immunol 2023; 14:1128326. [PMID: 37143667 PMCID: PMC10152969 DOI: 10.3389/fimmu.2023.1128326] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/24/2023] [Indexed: 05/06/2023] Open
Abstract
The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
Collapse
Affiliation(s)
- Lihua Deng
- Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Cedric Ly
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sina Abdollahi
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Yu Zhao
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Immo Prinz
- Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Bonn
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
15
|
Zhang P, Xia C, Shen HB. High-accuracy protein model quality assessment using attention graph neural networks. Brief Bioinform 2023; 24:7025462. [PMID: 36736352 DOI: 10.1093/bib/bbac614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/23/2022] [Accepted: 12/12/2022] [Indexed: 02/05/2023] Open
Abstract
Great improvement has been brought to protein tertiary structure prediction through deep learning. It is important but very challenging to accurately rank and score decoy structures predicted by different models. CASP14 results show that existing quality assessment (QA) approaches lag behind the development of protein structure prediction methods, where almost all existing QA models degrade in accuracy when the target is a decoy of high quality. How to give an accurate assessment to high-accuracy decoys is particularly useful with the available of accurate structure prediction methods. Here we propose a fast and effective single-model QA method, QATEN, which can evaluate decoys only by their topological characteristics and atomic types. Our model uses graph neural networks and attention mechanisms to evaluate global and amino acid level scores, and uses specific loss functions to constrain the network to focus more on high-precision decoys and protein domains. On the CASP14 evaluation decoys, QATEN performs better than other QA models under all correlation coefficients when targeting average LDDT. QATEN shows promising performance when considering only high-accuracy decoys. Compared to the embedded evaluation modules of predicted ${C}_{\alpha^{-}} RMSD$ (pRMSD) in RosettaFold and predicted LDDT (pLDDT) in AlphaFold2, QATEN is complementary and capable of achieving better evaluation on some decoy structures generated by AlphaFold2 and RosettaFold. These results suggest that the new QATEN approach can be used as a reliable independent assessment algorithm for high-accuracy protein structure decoys.
Collapse
Affiliation(s)
- Peidong Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Chunqiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| |
Collapse
|
16
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
17
|
Marin FI, Marcatili P. Computational Modeling of Antibody and T-Cell Receptor (CDR3 Loops). Methods Mol Biol 2023; 2552:83-100. [PMID: 36346586 DOI: 10.1007/978-1-0716-2609-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Antibodies and T-cell receptors have been a subject of much interest due to their central role in the immune system and their potential applications in several biotechnological and medical applications from cancer therapy to vaccine development. A unique feature of these two lymphocyte receptors is their ability to bind a huge variety of different (pathogen) targets. This ability stems from six short loops in the binding domain that have hypervariable sequence due to genetic recombination mechanism. Particularly one of these loops, the third complementarity determining region (CDR3), has the highest sequence variability and a dominant role in binding the target. However, it has also been proven the most difficult to be modeled structurally, which is vitally important for downstream tasks such as binding prediction. This difficulty stems from its variability in sequence that both reduces the possibility of finding homologues and introduces unique structural features in the loop. We present here a general protocol for modeling such loops in antibodies and T-cell receptors. We also discuss the difficulties in loop modeling and the advantages and limitations of different modeling methods.
Collapse
Affiliation(s)
- Frederikke I Marin
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Paolo Marcatili
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
18
|
Abstract
Protein structure modeling is one of the most advanced and complex processes in computational biology. One of the major problems for the protein structure prediction field has been how to estimate the accuracy of the predicted 3D models, on both a local and global level, in the absence of known structures. We must be able to accurately measure the confidence that we have in the quality predicted 3D models of proteins for them to become widely adopted by the general bioscience community. To address this major issue, it was necessary to develop new model quality assessment (MQA) methods and integrate them into our pipelines for building 3D protein models. Our MQA method, called ModFOLD, has been ranked as one of the most accurate MQA tools in independent blind evaluations. This chapter discusses model quality assessment in the protein modeling field, demonstrating both its strengths and limitations. We also present some of the best methods according to independent benchmarking data, which has been gathered in recent years.
Collapse
Affiliation(s)
- Ali H A Maghrabi
- College of Applied Sciences, Umm Al Qura University, Mecca, Saudi Arabia
| | | | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
19
|
San A, Palmieri D, Saxena A, Singh S. In silico study predicts a key role of RNA-binding domains 3 and 4 in nucleolin-miRNA interactions. Proteins 2022; 90:1837-1850. [PMID: 35514080 DOI: 10.1002/prot.26355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 04/07/2022] [Accepted: 04/26/2022] [Indexed: 12/17/2023]
Abstract
RNA binding proteins (RBPs) regulate many important cellular processes through their interactions with RNA molecules. RBPs are critical for posttranscriptional mechanisms keeping gene regulation in a fine equilibrium. Conversely, dysregulation of RBPs and RNA metabolism pathways is an established hallmark of tumorigenesis. Human nucleolin (NCL) is a multifunctional RBP that interacts with different types of RNA molecules, in part through its four RNA binding domains (RBDs). Particularly, NCL interacts directly with microRNAs (miRNAs) and is involved in their aberrant processing linked with many cancers, including breast cancer. Nonetheless, molecular details of the NCL-miRNA interaction remain obscure. In this study, we used an in silico approach to characterize how NCL targets miRNAs and whether this specificity is imposed by a definite RBD-interface. Here, we present structural models of NCL-RBDs and miRNAs, as well as predict scenarios of NCL-miRNA interactions generated using docking algorithms. Our study suggests a predominant role of NCL RBDs 3 and 4 (RBD3-4) in miRNA binding. We provide detailed analyses of specific motifs/residues at the NCL-substrate interface in both these RBDs and miRNAs. Finally, we propose that the evolutionary emergence of more than two RBDs in NCL in higher organisms coincides with its additional role/s in miRNA processing. Our study shows that RBD3-4 display sequence/structural determinants to specifically recognize miRNA precursor molecules. Moreover, the insights from this study can ultimately support the design of novel antineoplastic drugs aimed at regulating NCL-dependent biological pathways with a causal role in tumorigenesis.
Collapse
Affiliation(s)
- Avdar San
- Department of Biology, Brooklyn College, The City University of New York, Brooklyn, New York, USA
- The Biochemistry PhD Program, The Graduate Center of the City University of New York, New York, New York, USA
| | - Dario Palmieri
- Department of Cancer Biology and Genetics, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Anjana Saxena
- Department of Biology, Brooklyn College, The City University of New York, Brooklyn, New York, USA
- The Biochemistry PhD Program, The Graduate Center of the City University of New York, New York, New York, USA
| | - Shaneen Singh
- Department of Biology, Brooklyn College, The City University of New York, Brooklyn, New York, USA
- The Biochemistry PhD Program, The Graduate Center of the City University of New York, New York, New York, USA
| |
Collapse
|
20
|
Kurniawan J, Ishida T. Protein Model Quality Estimation Using Molecular Dynamics Simulation. ACS OMEGA 2022; 7:24274-24281. [PMID: 35874260 PMCID: PMC9301944 DOI: 10.1021/acsomega.2c01475] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The estimation of protein model quality remains a challenging task and is important for protein structural model utilization. In the last decade, existing methods that rely on machine learning to deep learning have been developed and shown progressive improvement. Despite utilizing more sophisticated techniques and introducing new features, none of these methods employ explicit protein structure stability information. Hypothetically, protein model quality might be indicated by its structural stability in an in silico system disclosed by the structural difference from its initial structure. One of the possible methods to exploit such information is by implementing molecular dynamics simulations that have shown successful applications in many research fields. We present a novel approach by introducing explicit protein structure stability information using molecular dynamics simulation. Despite using only simple features, small data with no training process required, and a short molecular dynamics simulation time, our method shows comparable performance to the state-of-the-art deep learning-based method.
Collapse
|
21
|
Chen X, Cheng J. DISTEMA: distance map-based estimation of single protein model accuracy with attentive 2D convolutional neural network. BMC Bioinformatics 2022; 23:141. [PMID: 35439931 PMCID: PMC9019949 DOI: 10.1186/s12859-022-04683-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Estimation of the accuracy (quality) of protein structural models is important for both prediction and use of protein structural models. Deep learning methods have been used to integrate protein structure features to predict the quality of protein models. Inter-residue distances are key information for predicting protein's tertiary structures and therefore have good potentials to predict the quality of protein structural models. However, few methods have been developed to fully take advantage of predicted inter-residue distance maps to estimate the accuracy of a single protein structural model. RESULT We developed an attentive 2D convolutional neural network (CNN) with channel-wise attention to take only a raw difference map between the inter-residue distance map calculated from a single protein model and the distance map predicted from the protein sequence as input to predict the quality of the model. The network comprises multiple convolutional layers, batch normalization layers, dense layers, and Squeeze-and-Excitation blocks with attention to automatically extract features relevant to protein model quality from the raw input without using any expert-curated features. We evaluated DISTEMA's capability of selecting the best models for CASP13 targets in terms of ranking loss of GDT-TS score. The ranking loss of DISTEMA is 0.079, lower than several state-of-the-art single-model quality assessment methods. CONCLUSION This work demonstrates that using raw inter-residue distance information with deep learning can predict the quality of protein structural models reasonably well. DISTEMA is freely at https://github.com/jianlin-cheng/DISTEMA.
Collapse
Affiliation(s)
- Xiao Chen
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri Columbia, Columbia, MO 65211 USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri Columbia, Columbia, MO 65211 USA
| |
Collapse
|
22
|
A Benchmark Dataset for Evaluating Practical Performance of Model Quality Assessment of Homology Models. Bioengineering (Basel) 2022; 9:bioengineering9030118. [PMID: 35324806 PMCID: PMC8945737 DOI: 10.3390/bioengineering9030118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/08/2022] [Accepted: 03/11/2022] [Indexed: 11/25/2022] Open
Abstract
Protein structure prediction is an important issue in structural bioinformatics. In this process, model quality assessment (MQA), which estimates the accuracy of the predicted structure, is also practically important. Currently, the most commonly used dataset to evaluate the performance of MQA is the critical assessment of the protein structure prediction (CASP) dataset. However, the CASP dataset does not contain enough targets with high-quality models, and thus cannot sufficiently evaluate the MQA performance in practical use. Additionally, most application studies employ homology modeling because of its reliability. However, the CASP dataset includes models generated by de novo methods, which may lead to the mis-estimation of MQA performance. In this study, we created new benchmark datasets, named a homology models dataset for model quality assessment (HMDM), that contain targets with high-quality models derived using homology modeling. We then benchmarked the performance of the MQA methods using the new datasets and compared their performance to that of the classical selection based on the sequence identity of the template proteins. The results showed that model selection by the latest MQA methods using deep learning is better than selection by template sequence identity and classical statistical potentials. Using HMDM, it is possible to verify the MQA performance for high-accuracy homology models.
Collapse
|
23
|
Sandoval K, McCormack GP. Actinoporin-like Proteins Are Widely Distributed in the Phylum Porifera. Mar Drugs 2022; 20:md20010074. [PMID: 35049929 PMCID: PMC8778704 DOI: 10.3390/md20010074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 01/07/2022] [Accepted: 01/10/2022] [Indexed: 11/16/2022] Open
Abstract
Actinoporins are proteinaceous toxins known for their ability to bind to and create pores in cellular membranes. This quality has generated interest in their potential use as new tools, such as therapeutic immunotoxins. Isolated historically from sea anemones, genes encoding for similar actinoporin-like proteins have since been found in a small number of other animal phyla. Sequencing and de novo assembly of Irish Haliclona transcriptomes indicated that sponges also possess similar genes. An exhaustive analysis of publicly available sequencing data from other sponges showed that this is a potentially widespread feature of the Porifera. While many sponge proteins possess a sequence similarity of 27.70–59.06% to actinoporins, they show consistency in predicted structure. One gene copy from H. indistincta has significant sequence similarity to sea anemone actinoporins and possesses conserved residues associated with the fundamental roles of sphingomyelin recognition, membrane attachment, oligomerization, and pore formation, indicating that it may be an actinoporin. Phylogenetic analyses indicate frequent gene duplication, no distinct clade for sponge-derived proteins, and a stronger signal towards actinoporins than similar proteins from other phyla. Overall, this study provides evidence that a diverse array of Porifera represents a novel source of actinoporin-like proteins which may have biotechnological and pharmaceutical applications.
Collapse
|
24
|
Machine learning to estimate the local quality of protein crystal structures. Sci Rep 2021; 11:23599. [PMID: 34880321 PMCID: PMC8654820 DOI: 10.1038/s41598-021-02948-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 11/24/2021] [Indexed: 11/23/2022] Open
Abstract
Low-resolution electron density maps can pose a major obstacle in the determination and use of protein structures. Herein, we describe a novel method, called quality assessment based on an electron density map (QAEmap), which evaluates local protein structures determined by X-ray crystallography and could be applied to correct structural errors using low-resolution maps. QAEmap uses a three-dimensional deep convolutional neural network with electron density maps and their corresponding coordinates as input and predicts the correlation between the local structure and putative high-resolution experimental electron density map. This correlation could be used as a metric to modify the structure. Further, we propose that this method may be applied to evaluate ligand binding, which can be difficult to determine at low resolution.
Collapse
|
25
|
Jiang Z, Wang C, Wu Z, Chen K, Yang W, Deng H, Song H, Zhou X. Enzymatic deamination of the epigenetic nucleoside N6-methyladenosine regulates gene expression. Nucleic Acids Res 2021; 49:12048-12068. [PMID: 34850126 PMCID: PMC8643624 DOI: 10.1093/nar/gkab1124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 10/20/2021] [Accepted: 11/16/2021] [Indexed: 12/26/2022] Open
Abstract
N6-methyladenosine (m6A) modification is the most extensively studied epigenetic modification due to its crucial role in regulating an array of biological processes. Herein, Bsu06560, formerly annotated as an adenine deaminase derived from Bacillus subtilis 168, was recognized as the first enzyme capable of metabolizing the epigenetic nucleoside N6-methyladenosine. A model of Bsu06560 was constructed, and several critical residues were putatively identified via mutational screening. Two mutants, F91L and Q150W, provided a superiorly enhanced conversion ratio of adenosine and N6-methyladenosine. The CRISPR-Cas9 system generated Bsu06560-knockout, F91L, and Q150W mutations from the B. subtilis 168 genome. Transcriptional profiling revealed a higher global gene expression level in BS-F91L and BS-Q150W strains with enhanced N6-methyladenosine deaminase activity. The differentially expressed genes were categorized using GO, COG, KEGG and verified through RT-qPCR. This study assessed the crucial roles of Bsu06560 in regulating adenosine and N6-methyladenosine metabolism, which influence a myriad of biological processes. This is the first systematic research to identify and functionally annotate an enzyme capable of metabolizing N6-methyladenosine and highlight its significant roles in regulation of bacterial metabolism. Besides, this study provides a novel method for controlling gene expression through the mutations of critical residues.
Collapse
Affiliation(s)
- Zhuoran Jiang
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Chao Wang
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Zixin Wu
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Kun Chen
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Wei Yang
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Hexiang Deng
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Heng Song
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Xiang Zhou
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| |
Collapse
|
26
|
Wang W, Wang J, Li Z, Xu D, Shang Y. MUfoldQA_G: High-accuracy protein model QA via retraining and transformation. Comput Struct Biotechnol J 2021; 19:6282-6290. [PMID: 34900138 PMCID: PMC8636996 DOI: 10.1016/j.csbj.2021.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/10/2021] [Accepted: 11/14/2021] [Indexed: 11/21/2022] Open
Abstract
Protein tertiary structure prediction is an active research area and has attracted significant attention recently due to the success of AlphaFold from DeepMind. Methods capable of accurately evaluating the quality of predicted models are of great importance. In the past, although many model quality assessment (QA) methods have been developed, their accuracies are not consistently high across different QA performance metrics for diverse target proteins. In this paper, we propose MUfoldQA_G, a new multi-model QA method that aims at simultaneously optimizing Pearson correlation and average GDT-TS difference, two commonly used QA performance metrics. This method is based on two new algorithms MUfoldQA_Gp and MUfoldQA_Gr. MUfoldQA_Gp uses a new technique to combine information from protein templates and reference protein models to maximize the Pearson correlation QA metric. MUfoldQA_Gr employs a new machine learning technique that resamples training data and retrains adaptively to learn a consensus model that is better than naïve consensus while minimizing average GDT-TS difference. MUfoldQA_G uses a new method to combine the results of MUfoldQA_Gr and MUfoldQA_Gp so that the final QA prediction results achieve low average GDT-TS difference that is close to the results from MUfoldQA_Gr, while maintaining high Pearson correlation that is the same as the results from MUfoldQA_Gp. In CASP14 QA categories, MUfoldQA_G ranked No. 1 in Pearson correlation and No. 2 in average GDT-TS difference.
Collapse
Affiliation(s)
- Wenbo Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Junlin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Zhaoyu Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yi Shang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
27
|
Proteomic Tools for the Analysis of Cytoskeleton Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2364:363-425. [PMID: 34542864 DOI: 10.1007/978-1-0716-1661-1_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
Collapse
|
28
|
Kong L, Ju F, Zhang H, Sun S, Bu D. FALCON2: a web server for high-quality prediction of protein tertiary structures. BMC Bioinformatics 2021; 22:439. [PMID: 34525939 PMCID: PMC8444573 DOI: 10.1186/s12859-021-04353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/01/2021] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. RESULTS In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. CONCLUSIONS By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| |
Collapse
|
29
|
Komarevtsev SK, Evseev PV, Shneider MM, Popova EA, Tupikin AE, Stepanenko VN, Kabilov MR, Shabunin SV, Osmolovskiy AA, Miroshnikov KA. Gene Analysis, Cloning, and Heterologous Expression of Protease from a Micromycete Aspergillus ochraceus Capable of Activating Protein C of Blood Plasma. Microorganisms 2021; 9:1936. [PMID: 34576831 PMCID: PMC8471544 DOI: 10.3390/microorganisms9091936] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 09/05/2021] [Accepted: 09/08/2021] [Indexed: 12/16/2022] Open
Abstract
Micromycetes are known to secrete numerous enzymes of biotechnological and medical potential. Fibrinolytic protease-activator of protein C (PAPC) of blood plasma from micromycete Aspergillus ochraceus VKM-F4104D was obtained in recombinant form utilising the bacterial expression system. This enzyme, which belongs to the proteinase-K-like proteases, is similar to the proteases encoded in the genomes of Aspergillus fumigatus ATCC MYA-4609, A. oryzae ATCC 42149 and A. flavus 28. Mature PAPC-4104 is 282 amino acids long, preceded by the 101-amino acid propeptide necessary for proper folding and maturation. The recombinant protease was identical to the native enzyme from micromycete in terms of its biological properties, including an ability to hydrolyse substrates of activated protein C (pGlu-Pro-Arg-pNA) and factor Xa (Z-D-Arg-Gly-Arg-pNA) in conjugant reactions with human blood plasma. Therefore, recombinant PAPC-4104 can potentially be used in medicine, veterinary science, diagnostics, and other applications.
Collapse
Affiliation(s)
- Sergei K. Komarevtsev
- Biology Department, Lomonosov Moscow State University, 119234 Moscow, Russia; (E.A.P.); (A.A.O.)
- All-Russian Scientific Research Veterinary Institute of Pathology, Pharmacology and Therapy, 394087 Voronezh, Russia; (S.V.S.); (K.A.M.)
| | - Peter V. Evseev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia; (P.V.E.); (M.M.S.); (V.N.S.)
| | - Mikhail M. Shneider
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia; (P.V.E.); (M.M.S.); (V.N.S.)
| | - Elizaveta A. Popova
- Biology Department, Lomonosov Moscow State University, 119234 Moscow, Russia; (E.A.P.); (A.A.O.)
| | - Alexey E. Tupikin
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia; (A.E.T.); (M.R.K.)
| | - Vasiliy N. Stepanenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia; (P.V.E.); (M.M.S.); (V.N.S.)
| | - Marsel R. Kabilov
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia; (A.E.T.); (M.R.K.)
| | - Sergei V. Shabunin
- All-Russian Scientific Research Veterinary Institute of Pathology, Pharmacology and Therapy, 394087 Voronezh, Russia; (S.V.S.); (K.A.M.)
| | - Alexander A. Osmolovskiy
- Biology Department, Lomonosov Moscow State University, 119234 Moscow, Russia; (E.A.P.); (A.A.O.)
- All-Russian Scientific Research Veterinary Institute of Pathology, Pharmacology and Therapy, 394087 Voronezh, Russia; (S.V.S.); (K.A.M.)
| | - Konstantin A. Miroshnikov
- All-Russian Scientific Research Veterinary Institute of Pathology, Pharmacology and Therapy, 394087 Voronezh, Russia; (S.V.S.); (K.A.M.)
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia; (P.V.E.); (M.M.S.); (V.N.S.)
| |
Collapse
|
30
|
Schaeffer RD, Kinch L, Kryshtafovych A, Grishin NV. Assessment of domain interactions in the fourteenth round of the Critical Assessment of Structure Prediction (CASP14). Proteins 2021; 89:1700-1710. [PMID: 34455641 PMCID: PMC8616818 DOI: 10.1002/prot.26225] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/07/2021] [Accepted: 08/24/2021] [Indexed: 12/29/2022]
Abstract
The high accuracy of some CASP14 models at the domain level prompted a more detailed evaluation of structure predictions on whole targets. For the first time in critical assessment of structure prediction (CASP), we evaluated accuracy of difficult domain assembly in models submitted for multidomain targets where the community predicted individual evaluation units (EUs) with greater accuracy than full-length targets. Ten proteins with domain interactions that did not show evidence of conformational change and were not involved in significant oligomeric contacts were chosen as targets for the domain interaction assessment. Groups were ranked using complementary interaction scores (F1, QS score, and Jaccard coefficient), and their predictions were evaluated for their ability to correctly model inter-domain interfaces and overall protein folds. Target performance was broadly grouped into two clusters. The first consisted primarily of targets containing two EUs wherein predictors more broadly predicted domain positioning and interfacial contacts correctly. The other consisted of complex two-EU and three-EU targets where few predictors performed well. The highest ranked predictor, AlphaFold2, produced high-accuracy models on eight out of 10 targets. Their interdomain scores on three of these targets were significantly higher than all other groups and were responsible for their overall outperformance in the category. We further highlight the performance of AlphaFold2 and the next best group, BAKER-experimental on several interesting targets.
Collapse
Affiliation(s)
- R Dustin Schaeffer
- Department of Biophysics, UT Southwestern Medical Center, Dallas, Texas, USA
| | - Lisa Kinch
- Howard Hughes Medical Institute, UT Southwestern Medical Center, Dallas, Texas, USA
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Nick V Grishin
- Department of Biophysics, UT Southwestern Medical Center, Dallas, Texas, USA.,Howard Hughes Medical Institute, UT Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
31
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
32
|
Jiang V, Khare SD, Banta S. Computational structure prediction provides a plausible mechanism for electron transfer by the outer membrane protein Cyc2 from Acidithiobacillus ferrooxidans. Protein Sci 2021; 30:1640-1652. [PMID: 33969560 DOI: 10.1002/pro.4106] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/30/2021] [Accepted: 05/03/2021] [Indexed: 12/14/2022]
Abstract
Cyc2 is the key protein in the outer membrane of Acidithiobacillus ferrooxidans that mediates electron transfer between extracellular inorganic iron and the intracellular central metabolism. This cytochrome c is specific for iron and interacts with periplasmic proteins to complete a reversible electron transport chain. A structure of Cyc2 has not yet been characterized experimentally. Here we describe a structural model of Cyc2, and associated proteins, to highlight a plausible mechanism for the ferrous iron electron transfer chain. A comparative modeling protocol specific for trans membrane beta barrel (TMBB) proteins in acidophilic conditions (pH ~ 2) was applied to the primary sequence of Cyc2. The proposed structure has three main regimes: Extracellular loops exposed to low-pH conditions, a TMBB, and an N-terminal cytochrome-like region within the periplasmic space. The Cyc2 model was further refined by identifying likely iron and heme docking sites. This represents the first computational model of Cyc2 that accounts for the membrane microenvironment and the acidity in the extracellular matrix. This approach can be used to model other TMBBs which can be critical for chemolithotrophic microbial growth.
Collapse
Affiliation(s)
- Virginia Jiang
- Department of Chemical Engineering, Columbia University in the City of New York, New York, New York, USA
| | - Sagar D Khare
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Scott Banta
- Department of Chemical Engineering, Columbia University in the City of New York, New York, New York, USA
| |
Collapse
|
33
|
Baldassarre F, Menéndez Hurtado D, Elofsson A, Azizpour H. GraphQA: protein model quality assessment using graph convolutional networks. Bioinformatics 2021; 37:360-366. [PMID: 32780838 PMCID: PMC8058777 DOI: 10.1093/bioinformatics/btaa714] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 07/03/2020] [Accepted: 08/05/2020] [Indexed: 11/25/2022] Open
Abstract
Motivation Proteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results. GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance and computational efficiency. Results GraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated. Availability and implementation PyTorch implementation, datasets, experiments and link to an evaluation server are available through this GitHub repository: github.com/baldassarreFe/graphqa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Federico Baldassarre
- Division of Robotics, Perception and Learning (RPL), KTH – Royal Institute of Technology, 10044 Stockholm, Sweden
| | - David Menéndez Hurtado
- Department of Intelligent Systems, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
- Department of Biochemistry and Biophysics, school of Electrical Engineering and Computer Science (EECS), Stockholm University, 10691 Stockholm, Sweden
| | - Arne Elofsson
- Department of Intelligent Systems, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
- Department of Biochemistry and Biophysics, school of Electrical Engineering and Computer Science (EECS), Stockholm University, 10691 Stockholm, Sweden
| | - Hossein Azizpour
- Division of Robotics, Perception and Learning (RPL), KTH – Royal Institute of Technology, 10044 Stockholm, Sweden
- To whom correspondence should be addressed.
| |
Collapse
|
34
|
Protein Structure Refinement Using Multi-Objective Particle Swarm Optimization with Decomposition Strategy. Int J Mol Sci 2021; 22:ijms22094408. [PMID: 33922489 PMCID: PMC8122964 DOI: 10.3390/ijms22094408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/02/2022] Open
Abstract
Protein structure refinement is a crucial step for more accurate protein structure predictions. Most existing approaches treat it as an energy minimization problem to intuitively improve the quality of initial models by searching for structures with lower energy. Considering that a single energy function could not reflect the accurate energy landscape of all the proteins, our previous AIR 1.0 pipeline uses multiple energy functions to realize a multi-objectives particle swarm optimization-based model refinement. It is expected to provide a general balanced conformation search protocol guided from different energy evaluations. However, AIR 1.0 solves the multi-objective optimization problem as a whole, which could not result in good solution diversity and convergence on some targets. In this study, we report a decomposition-based method AIR 2.0, which is an updated version of AIR, for protein structure refinement. AIR 2.0 decomposes a multi-objective optimization problem into a number of subproblems and optimizes them simultaneously using particle swarm optimization algorithm. The solutions yielded by AIR 2.0 show better convergence and diversity compared to its previous version, which increases the possibilities of digging out better structure conformations. The experimental results on CASP13 refinement benchmark targets and blind tests in CASP 14 demonstrate the efficacy of AIR 2.0.
Collapse
|
35
|
Waman VP, Sen N, Varadi M, Daina A, Wodak SJ, Zoete V, Velankar S, Orengo C. The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies. Brief Bioinform 2021; 22:742-768. [PMID: 33348379 PMCID: PMC7799268 DOI: 10.1093/bib/bbaa362] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 01/18/2023] Open
Abstract
SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor-antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.
Collapse
Affiliation(s)
| | | | | | - Antoine Daina
- Molecular Modeling Group at SIB, Swiss Institute of Bioinformatics
| | | | - Vincent Zoete
- Department of Fundamental Oncology at the University of Lausanne and Group leader at SIB
| | | | | |
Collapse
|
36
|
Takei Y, Ishida T. P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features. Bioengineering (Basel) 2021; 8:bioengineering8030040. [PMID: 33808604 PMCID: PMC8003382 DOI: 10.3390/bioengineering8030040] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 11/16/2022] Open
Abstract
Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.
Collapse
Affiliation(s)
- Yuma Takei
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Correspondence:
| |
Collapse
|
37
|
Shuvo MH, Bhattacharya S, Bhattacharya D. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks. Bioinformatics 2021; 36:i285-i291. [PMID: 32657397 PMCID: PMC7355297 DOI: 10.1093/bioinformatics/btaa455] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. RESULTS We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep. AVAILABILITY AND IMPLEMENTATION https://github.com/Bhattacharya-Lab/QDeep. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA.,Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
38
|
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat Commun 2021; 12:1340. [PMID: 33637700 PMCID: PMC7910447 DOI: 10.1038/s41467-021-21511-x] [Citation(s) in RCA: 135] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/18/2021] [Indexed: 11/22/2022] Open
Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Washington, WA, USA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Washington, WA, USA.
| |
Collapse
|
39
|
Phan IQ, Subramanian S, Kim D, Murphy M, Pettie D, Carter L, Anishchenko I, Barrett LK, Craig J, Tillery L, Shek R, Harrington WE, Koelle DM, Wald A, Veesler D, King N, Boonyaratanakornkit J, Isoherranen N, Greninger AL, Jerome KR, Chu H, Staker B, Stewart L, Myler PJ, Van Voorhis WC. In silico detection of SARS-CoV-2 specific B-cell epitopes and validation in ELISA for serological diagnosis of COVID-19. Sci Rep 2021; 11:4290. [PMID: 33619344 PMCID: PMC7900118 DOI: 10.1038/s41598-021-83730-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 02/03/2021] [Indexed: 02/07/2023] Open
Abstract
Rapid generation of diagnostics is paramount to understand epidemiology and to control the spread of emerging infectious diseases such as COVID-19. Computational methods to predict serodiagnostic epitopes that are specific for the pathogen could help accelerate the development of new diagnostics. A systematic survey of 27 SARS-CoV-2 proteins was conducted to assess whether existing B-cell epitope prediction methods, combined with comprehensive mining of sequence databases and structural data, could predict whether a particular protein would be suitable for serodiagnosis. Nine of the predictions were validated with recombinant SARS-CoV-2 proteins in the ELISA format using plasma and sera from patients with SARS-CoV-2 infection, and a further 11 predictions were compared to the recent literature. Results appeared to be in agreement with 12 of the predictions, in disagreement with 3, while a further 5 were deemed inconclusive. We showed that two of our top five candidates, the N-terminal fragment of the nucleoprotein and the receptor-binding domain of the spike protein, have the highest sensitivity and specificity and signal-to-noise ratio for detecting COVID-19 sera/plasma by ELISA. Mixing the two antigens together for coating ELISA plates led to a sensitivity of 94% (N = 80 samples from persons with RT-PCR confirmed SARS-CoV-2 infection), and a specificity of 97.2% (N = 106 control samples).
Collapse
Affiliation(s)
- Isabelle Q Phan
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - Sandhya Subramanian
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - David Kim
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Michael Murphy
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Deleah Pettie
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Ivan Anishchenko
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Lynn K Barrett
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Justin Craig
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Logan Tillery
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Roger Shek
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Whitney E Harrington
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - David M Koelle
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Benaroya Research Institute, Seattle, WA, USA
- Department of Global Health, University of Washington, Seattle, WA, USA
| | - Anna Wald
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - David Veesler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Neil King
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Jim Boonyaratanakornkit
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Nina Isoherranen
- Department of Pharmaceutics, University of Washington, Seattle, WA, USA
| | - Alexander L Greninger
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Keith R Jerome
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Helen Chu
- Division of Allergy and Infectious Diseases, Department of Medicine, Center for Emerging and Re-Emerging Infectious Diseases (CERID), University of Washington, Seattle, WA, USA
| | - Bart Staker
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - Lance Stewart
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design (IPD), University of Washington, Seattle, WA, USA
| | - Peter J Myler
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA
- Department of Medical Education and Biomedical Informatics & Department of Global Health, University of Washington, Seattle, WA, USA
| | - Wesley C Van Voorhis
- Seattle Structural Genomics Center for Infectious Disease (SSGCID), Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
- Department of Microbiology, University of Washington, Seattle, WA, USA.
- Department of Global Health, University of Washington, Seattle, WA, USA.
| |
Collapse
|
40
|
Aguirre-Plans J, Meseguer A, Molina-Fernandez R, Marín-López MA, Jumde G, Casanova K, Bonet J, Fornes O, Fernandez-Fuentes N, Oliva B. SPServer: split-statistical potentials for the analysis of protein structures and protein-protein interactions. BMC Bioinformatics 2021; 22:4. [PMID: 33407073 PMCID: PMC7788957 DOI: 10.1186/s12859-020-03770-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/20/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Statistical potentials, also named knowledge-based potentials, are scoring functions derived from empirical data that can be used to evaluate the quality of protein folds and protein-protein interaction (PPI) structures. In previous works we decomposed the statistical potentials in different terms, named Split-Statistical Potentials, accounting for the type of amino acid pairs, their hydrophobicity, solvent accessibility and type of secondary structure. These potentials have been successfully used to identify near-native structures in protein structure prediction, rank protein docking poses, and predict PPI binding affinities. RESULTS Here, we present the SPServer, a web server that applies the Split-Statistical Potentials to analyze protein folds and protein interfaces. SPServer provides global scores as well as residue/residue-pair profiles presented as score plots and maps. This level of detail allows users to: (1) identify potentially problematic regions on protein structures; (2) identify disrupting amino acid pairs in protein interfaces; and (3) compare and analyze the quality of tertiary and quaternary structural models. CONCLUSIONS While there are many web servers that provide scoring functions to assess the quality of either protein folds or PPI structures, SPServer integrates both aspects in a unique easy-to-use web server. Moreover, the server permits to locally assess the quality of the structures and interfaces at a residue level and provides tools to compare the local assessment between structures. SERVER ADDRESS: https://sbi.upf.edu/spserver/ .
Collapse
Grants
- BIO2017-85329-R (FEDER,UE) Ministerio de Economía, Industria y Competitividad, Gobierno de España
- BIO2017-83591-R(FEDER,UE Ministerio de Economía, Industria y Competitividad, Gobierno de España
- RYC-2015-17519 Ministerio de Economía, Industria y Competitividad, Gobierno de España
- MDM-2014-0370 Ministerio de Economía, Industria y Competitividad, Gobierno de España
- FI Agència de Gestió d'Ajuts Universitaris i de Recerca
- 2017 SGR 01020 Agència de Gestió d'Ajuts Universitaris i de Recerca
- PT13/0001/0023 Instituto de Salud Carlos III
- Agència de Gestió d’Ajuts Universitaris i de Recerca
Collapse
Affiliation(s)
- Joaquim Aguirre-Plans
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Alberto Meseguer
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Ruben Molina-Fernandez
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Manuel Alejandro Marín-López
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Gaurav Jumde
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Kevin Casanova
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Jaume Bonet
- Laboratory of Protein Design and Immuno-Enginneering, School of Engineering, Ecole Polytechnique Federale de Lausanne, 1015, Lausanne, Vaud, Switzerland
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Narcis Fernandez-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic 08500, Barcelona, Catalonia, Spain
- Institute of Biological, Environ-Mental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK
| | - Baldo Oliva
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain.
| |
Collapse
|
41
|
Studer G, Tauriello G, Bienert S, Biasini M, Johner N, Schwede T. ProMod3-A versatile homology modelling toolbox. PLoS Comput Biol 2021; 17:e1008667. [PMID: 33507980 PMCID: PMC7872268 DOI: 10.1371/journal.pcbi.1008667] [Citation(s) in RCA: 177] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 02/09/2021] [Accepted: 01/03/2021] [Indexed: 11/18/2022] Open
Abstract
Computational methods for protein structure modelling are routinely used to complement experimental structure determination, thus they help to address a broad spectrum of scientific questions in biomedical research. The most accurate methods today are based on homology modelling, i.e. detecting a homologue to the desired target sequence that can be used as a template for modelling. Here we present a versatile open source homology modelling toolbox as foundation for flexible and computationally efficient modelling workflows. ProMod3 is a fully scriptable software platform that can perform all steps required to generate a protein model by homology. Its modular design aims at fast prototyping of novel algorithms and implementing flexible modelling pipelines. Common modelling tasks, such as loop modelling, sidechain modelling or generating a full protein model by homology, are provided as production ready pipelines, forming the starting point for own developments and enhancements. ProMod3 is the central software component of the widely used SWISS-MODEL web-server.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marco Biasini
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niklaus Johner
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
42
|
Jing X, Xu J. Improved Protein Model Quality Assessment By Integrating Sequential And Pairwise Features Using Deep Learning. Bioinformatics 2020; 36:5361-5367. [PMID: 33325480 PMCID: PMC8016469 DOI: 10.1093/bioinformatics/btaa1037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/27/2020] [Accepted: 12/06/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Accurately estimating protein model quality in the absence of experimental structure is not only important for model evaluation and selection, but also useful for model refinement. Progress has been steadily made by introducing new features and algorithms (especially deep neural networks), but the accuracy of quality assessment (QA) is still not very satisfactory, especially local QA on hard protein targets. RESULTS We propose a new single-model-based QA method ResNetQA for both local and global quality assessment. Our method predicts model quality by integrating sequential and pairwise features using a deep neural network composed of both 1 D and 2 D convolutional residual neural networks (ResNet). The 2 D ResNet module extracts useful information from pairwise features such as model-derived distance maps, co-evolution information, and predicted distance potential from sequences. The 1 D ResNet is used to predict local (global) model quality from sequential features and pooled pairwise information generated by 2 D ResNet. Tested on the CASP12 and CASP13 datasets, our experimental results show that our method greatly outperforms existing state-of-the-art methods. Our ablation studies indicate that the 2 D ResNet module and pairwise features play an important role in improving model quality assessment. AVAILABILITY https://github.com/AndersJing/ResNetQA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
43
|
Ugarte-Alvarez O, Muñoz-López P, Moreno-Vargas LM, Prada-Gracia D, Mateos-Chávez AA, Becerra-Báez EI, Luria-Pérez R. Cell-Permeable Bak BH3 Peptide Induces Chemosensitization of Hematologic Malignant Cells. JOURNAL OF ONCOLOGY 2020; 2020:2679046. [PMID: 33312200 PMCID: PMC7721494 DOI: 10.1155/2020/2679046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/04/2020] [Accepted: 07/13/2020] [Indexed: 12/24/2022]
Abstract
Hematologic malignancies such as leukemias and lymphomas are among the leading causes of pediatric cancer death worldwide, and although survival rates have improved with conventional treatments, the development of drug-resistant cancer cells may lead to patient relapse and limited possibilities of a cure. Drug-resistant cancer cells in these hematologic neoplasms are induced by overexpression of the antiapoptotic B-cell lymphoma 2 (Bcl-2) protein families, such as Bcl-XL, Bcl-2, and Mcl-1. We have previously shown that peptides from the BH3 domain of the proapoptotic Bax protein that also belongs to the Bcl-2 family may antagonize the antiapoptotic activity of the Bcl-2 family proteins, restore apoptosis, and induce chemosensitization of tumor cells. Furthermore, cell-permeable Bax BH3 peptides also elicit antitumor activity and extend survival in a murine xenograft model of human B non-Hodgkin's lymphoma. However, the activity of the BH3 peptides of the proapoptotic Bak protein of the Bcl-2 family against these hematologic malignant cells requires further characterization. In this study, we report the ability of the cell-permeable Bak BH3 peptide to restore apoptosis and induce chemosensitization of acute lymphoblastic leukemia and non-Hodgkin's lymphoma cell lines, and this event is enhanced with the coadministration of cell-permeable Bax BH3 peptide and represents an attractive approach to improve the patient outcomes with relapsed or refractory hematological malignant cells.
Collapse
Affiliation(s)
- Omar Ugarte-Alvarez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Paola Muñoz-López
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
- Posgrado en Biomedicina y Biotecnología Molecular, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City 11340, Mexico
| | - Liliana Marisol Moreno-Vargas
- Research Unit on Computational Biology and Drug Design, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Diego Prada-Gracia
- Research Unit on Computational Biology and Drug Design, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Armando Alfredo Mateos-Chávez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Elayne Irene Becerra-Báez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
- Posgrado en Biomedicina y Biotecnología Molecular, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City 11340, Mexico
| | - Rosendo Luria-Pérez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| |
Collapse
|
44
|
Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T. QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 2020; 36:1765-1771. [PMID: 31697312 PMCID: PMC7075525 DOI: 10.1093/bioinformatics/btz828] [Citation(s) in RCA: 527] [Impact Index Per Article: 105.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 10/24/2019] [Accepted: 11/06/2019] [Indexed: 01/13/2023] Open
Abstract
Motivation Methods that estimate the quality of a 3D protein structure model in absence of an experimental reference structure are crucial to determine a model’s utility and potential applications. Single model methods assess individual models whereas consensus methods require an ensemble of models as input. In this work, we extend the single model composite score QMEAN that employs statistical potentials of mean force and agreement terms by introducing a consensus-based distance constraint (DisCo) score. Results DisCo exploits distance distributions from experimentally determined protein structures that are homologous to the model being assessed. Feed-forward neural networks are trained to adaptively weigh contributions by the multi-template DisCo score and classical single model QMEAN parameters. The result is the composite score QMEANDisCo, which combines the accuracy of consensus methods with the broad applicability of single model approaches. We also demonstrate that, despite being the de-facto standard for structure prediction benchmarking, CASP models are not the ideal data source to train predictive methods for model quality estimation. For performance assessment, QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low response times. Availability and implementation QMEANDisCo is available as web-server at https://swissmodel.expasy.org/qmean. The source code can be downloaded from https://git.scicore.unibas.ch/schwede/QMEAN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
45
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
46
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
47
|
Liu T, Wang Z. MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials. BMC Bioinformatics 2020; 21:246. [PMID: 32631256 PMCID: PMC7336608 DOI: 10.1186/s12859-020-3383-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 01/22/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
48
|
Miranda MRA, Uchôa AF, Ferreira SR, Ventury KE, Costa EP, Carmo PRL, Machado OLT, Fernandes KVS, Amancio Oliveira AE. Chemical Modifications of Vicilins Interfere with Chitin-Binding Affinity and Toxicity to Callosobruchus maculatus (Coleoptera: Chrysomelidae) Insect: A Combined In Vitro and In Silico Analysis. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:5596-5605. [PMID: 32343573 DOI: 10.1021/acs.jafc.9b08034] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Vicilins are related to cowpea seed resistance toward Callosobruchus maculatus due to their ability to bind to chitinous structures lining larval midgut. However, this binding mechanism is not fully understood. Here, we identified chitin binding sites and investigated how in vitro and in silico chemical modifications interfere with vicilin chitin binding and insect toxicity. In vitro assays showed that unmodified vicilin strongly binds to chitin matrices, mainly with acetylated chitin. Chemical modifications of specific amino acids (tryptophan, lysine, tyrosine), as well as glutaraldehyde cross-linking, decreased the evaluated parameters. In silico analyses identified at least one chitin binding site in vicilin monomer, the region between Arg208 and Lys216, which bears the sequence REGIRELMK and forms an α helix, exposed in the 3D structure. In silico modifications of Lys223 (acetylated at its terminal nitrogen) and Trp316 (iodinated to 7-iodine-L-tryptophan or oxidized to β-oxy-indolylalanine) decreased vicilin chitin binding affinity. Glucose, sucrose, and N-acetylglucosamine also interfered with vicilin chitin binding affinity.
Collapse
Affiliation(s)
- Maria Raquel A Miranda
- Departamento de Bioquímica, Centro de Ciências, Universidade Federal do Ceará (UFC), Fortaleza Ceará 60440554, Brazil
| | - Adriana F Uchôa
- Departamento de Biologia Celular e Genética, Centro de Biociências, Universidade Federal do Rio Grande do Norte, Natal, Rio Grande do Norte 59072970, Brazil
| | - Sarah R Ferreira
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Kayan E Ventury
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Evenilton P Costa
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Paulo R Leitão Carmo
- NUPEN, Universidade Federal do Rio de Janeiro (UFRJ) Macaé, Rio de Janeiro 27965-045, Brazil
| | - Olga L T Machado
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Katia V S Fernandes
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Antonia Elenir Amancio Oliveira
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| |
Collapse
|
49
|
Olechnovič K, Venclovas Č. VoroMQA web server for assessing three-dimensional structures of proteins and protein complexes. Nucleic Acids Res 2020; 47:W437-W442. [PMID: 31073605 PMCID: PMC6602437 DOI: 10.1093/nar/gkz367] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/19/2019] [Accepted: 05/05/2019] [Indexed: 01/12/2023] Open
Abstract
The VoroMQA (Voronoi tessellation-based Model Quality Assessment) web server is dedicated to the estimation of protein structure quality, a common step in selecting realistic and most accurate computational models and in validating experimental structures. As an input, the VoroMQA web server accepts one or more protein structures in PDB format. Input structures may be either monomeric proteins or multimeric protein complexes. For every input structure, the server provides both global and local (per-residue) scores. Visualization of the local scores along the protein chain is enhanced by providing secondary structure assignment and information on solvent accessibility. A unique feature of the VoroMQA server is the ability to directly assess protein-protein interaction interfaces. If this type of assessment is requested, the web server provides interface quality scores, interface energy estimates, and local scores for residues involved in inter-chain interfaces. VoroMQA, the underlying method of the web server, was extensively tested in recent community-wide CASP and CAPRI experiments. During these experiments VoroMQA showed outstanding performance both in model selection and in estimation of accuracy of local structural regions. The VoroMQA web server is available at http://bioinformatics.ibt.lt/wtsam/voromqa.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
50
|
McGuffin LJ, Adiyaman R, Maghrabi AHA, Shuid AN, Brackenridge DA, Nealon JO, Philomina LS. IntFOLD: an integrated web resource for high performance protein structure and function prediction. Nucleic Acids Res 2020; 47:W408-W413. [PMID: 31045208 PMCID: PMC6602432 DOI: 10.1093/nar/gkz322] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 04/05/2019] [Accepted: 04/23/2019] [Indexed: 12/14/2022] Open
Abstract
The IntFOLD server provides a unified resource for the automated prediction of: protein tertiary structures with built-in estimates of model accuracy (EMA), protein structural domain boundaries, natively unstructured or disordered regions in proteins, and protein–ligand interactions. The component methods have been independently evaluated via the successive blind CASP experiments and the continual CAMEO benchmarking project. The IntFOLD server has established its ranking as one of the best performing publicly available servers, based on independent official evaluation metrics. Here, we describe significant updates to the server back end, where we have focused on performance improvements in tertiary structure predictions, in terms of global 3D model quality and accuracy self-estimates (ASE), which we achieve using our newly improved ModFOLD7_rank algorithm. We also report on various upgrades to the front end including: a streamlined submission process, enhanced visualization of models, new confidence scores for ranking, and links for accessing all annotated model data. Furthermore, we now include an option for users to submit selected models for further refinement via convenient push buttons. The IntFOLD server is freely available at: http://www.reading.ac.uk/bioinf/IntFOLD/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ahmad N Shuid
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.,Infectomics cluster, Advanced Medical and Dental Institute, University of Science, Malaysia, Bertam, 13200, Kepala Batas, Pulau Pinang, Malaysia
| | | | - John O Nealon
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Limcy S Philomina
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|