1
|
Genc AG, McGuffin LJ. Beyond AlphaFold2: The Impact of AI for the Further Improvement of Protein Structure Prediction. Methods Mol Biol 2025; 2867:121-139. [PMID: 39576578 DOI: 10.1007/978-1-0716-4196-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein structure prediction is fundamental to molecular biology and has numerous applications in areas such as drug discovery and protein engineering. Machine learning techniques have greatly advanced protein 3D modeling in recent years, particularly with the development of AlphaFold2 (AF2), which can analyze sequences of amino acids and predict 3D structures with near experimental accuracy. Since the release of AF2, numerous studies have been conducted, either using AF2 directly for large-scale modeling or building upon the software for other use cases. Many reviews have been published discussing the impact of AF2 in the field of protein bioinformatics, particularly in relation to neural networks, which have highlighted what AF2 can and cannot do. It is evident that AF2 and similar approaches are open to further development and several new approaches have emerged, in addition to older refinement approaches, for improving the quality of predictions. Here we provide a brief overview, aimed at the general biologist, of how machine learning techniques have been used for improvement of 3D models of proteins following AF2, and we highlight the impacts of these approaches. In the most recent experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP15), the most successful groups all developed their own tools for protein structure modeling that were based at least in some part on AF2. This improvement involved employing techniques such as generative modeling, changing parameters such as dropout to generate more AF2 structures, and data-driven approaches including using alternative templates and MSAs.
Collapse
Affiliation(s)
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
2
|
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int J Mol Sci 2024; 25:8426. [PMID: 39125995 PMCID: PMC11313475 DOI: 10.3390/ijms25158426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.
Collapse
Affiliation(s)
- Lingtao Chen
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Qiaomu Li
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Kazi Fahim Ahmad Nasif
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Ying Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Bobin Deng
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Shuteng Niu
- Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403, USA;
| | - Seyedamin Pouriyeh
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Zhiyu Dai
- Division of Pulmonary and Critical Care Medicine, John T. Milliken Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA;
| | - Jiawei Chen
- College of Computing, Data Science and Society, University of California, Berkeley, CA 94720, USA;
| | - Chloe Yixin Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| |
Collapse
|
3
|
Shuvo MH, Karim M, Bhattacharya D. iQDeep: an integrated web server for protein scoring using multiscale deep learning models. J Mol Biol 2023; 435:168057. [PMID: 37356909 PMCID: PMC10291203 DOI: 10.1016/j.jmb.2023.168057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 03/01/2023] [Accepted: 03/17/2023] [Indexed: 06/27/2023]
Abstract
The remarkable recent advances in protein structure prediction have enabled computational modeling of protein structures with considerably higher accuracy than ever before. While state-of-the-art structure prediction methods provide self-assessment confidence scores of their own predictions, an independent and open-access system for protein scoring is still needed that can be applied to a broad range of predictive modeling scenarios. Here, we present iQDeep, an integrated and highly customizable web server for protein scoring, freely available at http://fusion.cs.vt.edu/iQDeep. The underlying method of iQDeep employs multiscale deep residual neural networks (ResNets) to perform residue-level error classifications, and then probabilistically combines the error classifications for protein scoring. By adjusting the error resolutions, our method can reliably estimate the standard- or high-accuracy variants of the Global Distance Test metric for versatile protein scoring. The performance of the method has been extensively tested and compared against the state-of-the-art approaches in multiple rounds of Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments including benchmark assessment in CASP12 and CASP13 as well as blind evaluation in CASP14. The iQDeep web server offers a number of convenient features, including (i) the choice of individual and batch processing modes; (ii) an interactive and privacy-preserving web interface for automated job submission, tracking, and results retrieval; (iii) web-based quantitative and visual analyses of the results including overall estimated score and its residue-wise breakdown along with agreements between various sequence- and structural-level features; (iv) extensive help information on job submission and results interpretation via web-based tutorial and help tooltips.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States. https://twitter.com/mzs0149
| | - Mohimenul Karim
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States. https://twitter.com/CaptainRafi97
| | | |
Collapse
|
4
|
Hameduh T, Mokry M, Miller AD, Heger Z, Haddad Y. Solvent Accessibility Promotes Rotamer Errors during Protein Modeling with Major Side-Chain Prediction Programs. J Chem Inf Model 2023. [PMID: 37410883 PMCID: PMC10369486 DOI: 10.1021/acs.jcim.3c00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
Side-chain rotamer prediction is one of the most critical late stages in protein 3D structure building. Highly advanced and specialized algorithms (e.g., FASPR, RASP, SCWRL4, and SCWRL4v) optimize this process by use of rotamer libraries, combinatorial searches, and scoring functions. We seek to identify the sources of key rotamer errors as a basis for correcting and improving the accuracy of protein modeling going forward. In order to evaluate the aforementioned programs, we process 2496 high-quality single-chained all-atom filtered 30% homology protein 3D structures and use discretized rotamer analysis to compare original with calculated structures. Among 513,024 filtered residue records, increased amino acid residue-dependent rotamer errors─associated in particular with polar and charged amino acid residues (ARG, LYS, and GLN)─clearly correlate with increased amino acid residue solvent accessibility and an increased residue tendency toward the adoption of non-canonical off rotamers which modeling programs struggle to predict accurately. Understanding the impact of solvent accessibility now appears key to improved side-chain prediction accuracies.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| | - Michal Mokry
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| | - Andrew D Miller
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
- Veterinary Research Institute, Hudcova 296/70, CZ-621 00 Brno, Czech Republic
- KP Therapeutics (Europe) s.r.o., Purkyňova 649/127, CZ-612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| |
Collapse
|
5
|
Pandya N, Kumar A. An immunoinformatics analysis: design of a multi-epitope vaccine against Cryptosporidium hominis by employing heat shock protein triggers the innate and adaptive immune responses. J Biomol Struct Dyn 2023; 41:13563-13579. [PMID: 36764824 DOI: 10.1080/07391102.2023.2175373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 01/28/2023] [Indexed: 02/12/2023]
Abstract
Cryptosporidium hominis, an anthropologically transferred species in the Cryptosporidium genus, represents many clinical studies in several countries. Its growth in the recent decade is primarily owing to epidemiologic studies. This parasite has complicated life cycles that require differentiation through a variety of phases of development and passage across two or more hosts throughout their lifetimes. As they move from host to host and environment to environment, pathogenic organisms are continually exposed to unexpected changes in the circumstances under which they develop. Heat shock proteins (HSPs) are targets of the host immune response; they are involved in the progression of diseases and play a significant part in this process. It has been discovered that the immunodominant immunogenic antigens in parasite infections HSPs. In this study, we have generated a multi-epitope vaccine against Cryptosporidium hominis (C. hominis) by using heat shock proteins. The epitopes that were selected had a substantial binding affinity for the B- and T-cell reference set of alleles, a high antigenicity score, a nature that was not allergic, a high solubility, non-toxicity and good binders. The epitopes were incorporated into a chimeric vaccine by using appropriate linkers. In order to increase the immunogenicity of the connected epitopes and effectively activate both innate and adaptive immunity, an adjuvant was attached to the epitopes. We have also analyzed the physiochemical characteristics of the vaccine which were satisfactory and then lead to the development of a 3D model. In addition, the binding confirmation of the vaccine to the TLR-4 innate immune receptor was also determined using molecular docking and molecular dynamics (MD) simulation. The results of this simulation show that the vaccine has a strong binding affinity for TLR4, which indicates that the vaccine is highly effective. In general, the vaccine that has been described here has a good potential for inducing protective and targeted immunogenicity, however, this hypothesis is contingent upon more experimental testing.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Nirali Pandya
- Department of Chemistry, National University of Singapore, Singapore, Singapore
| | - Amit Kumar
- Department of Biosciences and Biomedical Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India
| |
Collapse
|
6
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
7
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
8
|
Mahmoud NA, Elshafei AM, Almofti YA. A novel strategy for developing vaccine candidate against Jaagsiekte sheep retrovirus from the envelope and gag proteins: an in-silico approach. BMC Vet Res 2022; 18:343. [PMID: 36085036 PMCID: PMC9463060 DOI: 10.1186/s12917-022-03431-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Sheep pulmonary adenocarcinoma (OPA) is a contagious lung cancer of sheep caused by the Jaagsiekte retrovirus (JSRV). OPA typically has a serious economic impact worldwide. A vaccine has yet to be developed, even though the disease has been globally spread, along with its complications. This study aimed to construct an effective multi-epitopes vaccine against JSRV eliciting B and T lymphocytes using immunoinformatics tools. RESULTS The designed vaccine was composed of 499 amino acids. Before the vaccine was computationally validated, all critical parameters were taken into consideration; including antigenicity, allergenicity, toxicity, and stability. The physiochemical properties of the vaccine displayed an isoelectric point of 9.88. According to the Instability Index (II), the vaccine was stable at 28.28. The vaccine scored 56.51 on the aliphatic index and -0.731 on the GRAVY, indicating that the vaccine was hydrophilic. The RaptorX server was used to predict the vaccine's tertiary structure, the GalaxyWEB server refined the structure, and the Ramachandran plot and the ProSA-web server validated the vaccine's tertiary structure. Protein-sol and the SOLPro servers showed the solubility of the vaccine. Moreover, the high mobile regions in the vaccine's structure were reduced and the vaccine's stability was improved by disulfide engineering. Also, the vaccine construct was docked with an ovine MHC-1 allele and showed efficient binding energy. Immune simulation remarkably showed high levels of immunoglobulins, T lymphocytes, and INF-γ secretions. The molecular dynamic simulation provided the stability of the constructed vaccine. Finally, the vaccine was back-transcribed into a DNA sequence and cloned into a pET-30a ( +) vector to affirm the potency of translation and microbial expression. CONCLUSION A novel multi-epitopes vaccine construct against JSRV, was formed from B and T lymphocytes epitopes, and was produced with potential protection. This study might help in controlling and eradicating OPA.
Collapse
Affiliation(s)
- Nuha Amin Mahmoud
- Department of Biochemistry, Genetics and Molecular Biology/ Faculty of Medicine and Surgery, National University, Khartoum, Sudan
| | - Abdelmajeed M Elshafei
- Department of Biochemistry, Genetics and Molecular Biology/ Faculty of Medicine and Surgery, National University, Khartoum, Sudan
| | - Yassir A Almofti
- Department of Biochemistry, Genetics and Molecular Biology/ Faculty of Medicine and Surgery, National University, Khartoum, Sudan.
- Department of Molecular Biology and Bioinformatics, College of Veterinary Medicine, University of Bahri, Khartoum, Sudan.
| |
Collapse
|
9
|
Yuan Z, Wu M, Meng Y, Niu Y, Xiao W, Ruan X, He G, Jiang X. Protein crystal regulation and harvest via electric field-based method. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
10
|
Non-Psychotropic Cannabinoids as Inhibitors of TET1 Protein. Bioorg Chem 2022; 124:105793. [DOI: 10.1016/j.bioorg.2022.105793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 03/30/2022] [Accepted: 04/03/2022] [Indexed: 12/18/2022]
|
11
|
Jindal S, Hsu PJ, Phan HT, Tsou PK, Kuo JL. Capturing the potential energy landscape of large size molecular clusters from atomic interactions up to a 4-body system using deep learning. Phys Chem Chem Phys 2022; 24:27263-27276. [DOI: 10.1039/d2cp04441b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We propose a new method that utilizes the database of stable conformers and borrow the fragmentation concept of many-body-expansion (MBE) methods in ab initio methods to train a deep-learning machine learning (ML) model using SchNet.
Collapse
Affiliation(s)
- Shweta Jindal
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Po-Jen Hsu
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Huu Trong Phan
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
- Molecular Science and Technology Program, Taiwan International Graduate Program, Academia Sinica, Taipei, 11529, Taiwan
- Department of Chemistry, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Pei-Kang Tsou
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Jer-Lai Kuo
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
- Department of Chemistry, National Tsing Hua University, Hsinchu 30013, Taiwan
- Molecular Science and Technology, National Taiwan University, Section 4, Daan District, Taipei City 10617, Taiwan
| |
Collapse
|
12
|
Verburgt J, Kihara D. Benchmarking of structure refinement methods for protein complex models. Proteins 2022; 90:83-95. [PMID: 34309909 PMCID: PMC8671191 DOI: 10.1002/prot.26188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 06/24/2021] [Accepted: 07/22/2021] [Indexed: 01/03/2023]
Abstract
Protein structure docking is the process in which the quaternary structure of a protein complex is predicted from individual tertiary structures of the protein subunits. Protein docking is typically performed in two main steps. The subunits are first docked while keeping them rigid to form the complex, which is then followed by structure refinement. Structure refinement is crucial for a practical use of computational protein docking models, as it is aimed for correcting conformations of interacting residues and atoms at the interface. Here, we benchmarked the performance of eight existing protein structure refinement methods in refinement of protein complex models. We show that the fraction of native contacts between subunits is by far the most straightforward metric to improve. However, backbone dependent metrics, based on the Root Mean Square Deviation proved more difficult to improve via refinement.
Collapse
Affiliation(s)
- Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
- Purdue University Center for Cancer Research, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
13
|
Proteomic Tools for the Analysis of Cytoskeleton Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2364:363-425. [PMID: 34542864 DOI: 10.1007/978-1-0716-1661-1_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
Collapse
|
14
|
Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021; 89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]
Abstract
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Stephan Eismann
- Department of Computer Science and Applied Physics, Stanford University, Stanford, California, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| |
Collapse
|
15
|
Nazet J, Lang E, Merkl R. Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network. PLoS One 2021; 16:e0256691. [PMID: 34437621 PMCID: PMC8389498 DOI: 10.1371/journal.pone.0256691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/12/2021] [Indexed: 12/05/2022] Open
Abstract
Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
Collapse
Affiliation(s)
- Julian Nazet
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Elmar Lang
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|
16
|
Shuvo MH, Gulfam M, Bhattacharya D. DeepRefiner: high-accuracy protein structure refinement by deep network calibration. Nucleic Acids Res 2021; 49:W147-W152. [PMID: 33999209 PMCID: PMC8262753 DOI: 10.1093/nar/gkab361] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/18/2021] [Accepted: 04/23/2021] [Indexed: 12/20/2022] Open
Abstract
The DeepRefiner webserver, freely available at http://watson.cse.eng.auburn.edu/DeepRefiner/, is an interactive and fully configurable online system for high-accuracy protein structure refinement. Fuelled by deep learning, DeepRefiner offers the ability to leverage cutting-edge deep neural network architectures which can be calibrated for on-demand selection of adventurous or conservative refinement modes targeted at degree or consistency of refinement. The method has been extensively tested in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments under the group name 'Bhattacharya-Server' and was officially ranked as the No. 2 refinement server in CASP13 (second only to 'Seok-server' and outperforming all other refinement servers) and No. 2 refinement server in CASP14 (second only to 'FEIG-S' and outperforming all other refinement servers including 'Seok-server'). The DeepRefiner web interface offers a number of convenient features, including (i) fully customizable refinement job submission and validation; (ii) automated job status update, tracking, and notifications; (ii) interactive and interpretable web-based results retrieval with quantitative and visual analysis and (iv) extensive help information on job submission and results interpretation via web-based tutorial and help tooltips.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Muhammad Gulfam
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
17
|
Jing X, Xu J. Fast and effective protein model refinement using deep graph neural networks. NATURE COMPUTATIONAL SCIENCE 2021; 1:462-469. [PMID: 35321360 DOI: 10.1038/s43588-021-00098-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein model refinement is the last step applied to improve the quality of a predicted protein model. Currently the most successful refinement methods rely on extensive conformational sampling and thus, take hours or days to refine even a single protein model. Here we propose a fast and effective model refinement method that applies GNN (graph neural networks) to predict refined inter-atom distance probability distribution from an initial model and then rebuilds 3D models from the predicted distance distribution. Tested on the CASP (Critical Assessment of Structure Prediction) refinement targets, our method has comparable accuracy as two leading human groups Feig and Baker, but runs substantially faster. Our method may refine one protein model within ~11 minutes on 1 CPU while Baker needs ~30 hours on 60 CPUs and Feig needs ~16 hours on 1 GPU. Finally, our study shows that GNN outperforms ResNet (convolutional residual neural networks) for model refinement when very limited conformational sampling is allowed.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| |
Collapse
|
18
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
19
|
A hijack mechanism of Indian SARS-CoV-2 isolates for relapsing contemporary antiviral therapeutics. Comput Biol Med 2021; 132:104315. [PMID: 33705994 PMCID: PMC7935700 DOI: 10.1016/j.compbiomed.2021.104315] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 03/02/2021] [Indexed: 12/16/2022]
Abstract
Coronavirus disease (COVID-19) rapidly expands to a global pandemic and its impact on public health varies from country to country. It is caused by a new virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It is imperative for relapsing current antiviral therapeutics owing to randomized genetic drift in global SARS-CoV-2 isolates. A molecular mechanism behind the emerging genomic variants is not yet understood for the prioritization of selective antivirals. The present computational study was aimed to repurpose existing antivirals for Indian SARS-CoV-2 isolates by uncovering a hijack mechanism based on structural and functional characteristics of protein variants. Forty-one protein mutations were identified in 12 Indian SARS-CoV-2 isolates by analysis of genome variations across 460 genome sequences obtained from 30 geographic sites in India. Two unique mutations such as W6152R and N5928H found in exonuclease of Surat (GBRC275b) and Gandhinagar (GBRC239) isolates. We report for the first time the impact of folding rate on stabilizing/retaining a sequence-structure-function-virulence link of emerging protein variants leading to accommodate hijack ability from current antivirals. Binding affinity analysis revealed the effect of point mutations on virus infectivity and the drug-escaping efficiency of Indian isolates. Emodin and artinemol suggested herein as repurposable antivirals for the treatment of COVID-19 patients infected with Indian isolates. Our study concludes that a protein folding rate is a key structural and evolutionary determinant to enhance the receptor-binding specificity and ensure hijack ability from the prevalent antiviral therapeutics.
Collapse
|
20
|
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat Commun 2021; 12:1340. [PMID: 33637700 PMCID: PMC7910447 DOI: 10.1038/s41467-021-21511-x] [Citation(s) in RCA: 133] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/18/2021] [Indexed: 11/22/2022] Open
Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Washington, WA, USA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Washington, WA, USA.
| |
Collapse
|
21
|
Bell A, Christian L, Hecht D, Huisinga K, Rakus J, Bell E. Teaching virtual protein-centric CUREs and UREs using computational tools. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2020; 48:646-647. [PMID: 32919430 DOI: 10.1002/bmb.21454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 08/21/2020] [Accepted: 08/23/2020] [Indexed: 06/11/2023]
Abstract
Readily available, free, computational approaches, adaptable for topics accessible for first to senior year classes and individual research projects, emphasizing contributions of noncovalent interactions to structure, binding and catalysis were used to teach Course-based Undergraduate Research Experiences that fulfil generally accepted main CURE components: Scientific Background, Hypothesis Development, Proposal, Experiments, Teamwork, Data Analysis of quantitative data, Conclusions, and Presentation.
Collapse
Affiliation(s)
- Anthony Bell
- University of San Diego, San Diego, California, USA
| | | | - David Hecht
- Southwestern College, Chula Vista, California, USA
| | | | - John Rakus
- Marshall University, Huntington, West Virginia, USA
| | - Ellis Bell
- University of San Diego, San Diego, California, USA
| |
Collapse
|
22
|
Lee GR, Won J, Heo L, Seok C. GalaxyRefine2: simultaneous refinement of inaccurate local regions and overall protein structure. Nucleic Acids Res 2020; 47:W451-W455. [PMID: 31001635 PMCID: PMC6602442 DOI: 10.1093/nar/gkz288] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/01/2019] [Accepted: 04/11/2019] [Indexed: 11/12/2022] Open
Abstract
The 3D structure of a protein can be predicted from its amino acid sequence with high accuracy for a large fraction of cases because of the availability of large quantities of experimental data and the advance of computational algorithms. Recently, deep learning methods exploiting the coevolution information obtained by comparing related protein sequences have been successfully used to generate highly accurate model structures even in the absence of template structure information. However, structures predicted based on either template structures or related sequences require further improvement in regions for which information is missing. Refining a predicted protein structure with insufficient information on certain regions is critical because these regions may be connected to functional specificity that is not conserved among related proteins. The GalaxyRefine2 web server, freely available via http://galaxy.seoklab.org/refine2, is an upgraded version of the GalaxyRefine protein structure refinement server and reflects recent developments successfully tested through CASP blind prediction experiments. This method adopts an iterative optimization approach involving various structure move sets to refine both local and global structures. The estimation of local error and hybridization of available homolog structures are also employed for effective conformation search.
Collapse
Affiliation(s)
- Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
23
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
24
|
Badaczewska-Dawid AE, Kolinski A, Kmiecik S. Computational reconstruction of atomistic protein structures from coarse-grained models. Comput Struct Biotechnol J 2019; 18:162-176. [PMID: 31969975 PMCID: PMC6961067 DOI: 10.1016/j.csbj.2019.12.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/02/2023] Open
Abstract
Three-dimensional protein structures, whether determined experimentally or theoretically, are often too low resolution. In this mini-review, we outline the computational methods for protein structure reconstruction from incomplete coarse-grained to all atomistic models. Typical reconstruction schemes can be divided into four major steps. Usually, the first step is reconstruction of the protein backbone chain starting from the C-alpha trace. This is followed by side-chains rebuilding based on protein backbone geometry. Subsequently, hydrogen atoms can be reconstructed. Finally, the resulting all-atom models may require structure optimization. Many methods are available to perform each of these tasks. We discuss the available tools and their potential applications in integrative modeling pipelines that can transfer coarse-grained information from computational predictions, or experiment, to all atomistic structures.
Collapse
Affiliation(s)
| | | | - Sebastian Kmiecik
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
25
|
Park H, Lee GR, Kim DE, Anishchenko I, Cong Q, Baker D. High-accuracy refinement using Rosetta in CASP13. Proteins 2019; 87:1276-1282. [PMID: 31325340 DOI: 10.1002/prot.25784] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 07/11/2019] [Accepted: 07/12/2019] [Indexed: 11/06/2022]
Abstract
Because proteins generally fold to their lowest free energy states, energy-guided refinement in principle should be able to systematically improve the quality of protein structure models generated using homologous structure or co-evolution derived information. However, because of the high dimensionality of the search space, there are far more ways to degrade the quality of a near native model than to improve it, and hence, refinement methods are very sensitive to energy function errors. In the 13th Critial Assessment of techniques for protein Structure Prediction (CASP13), we sought to carry out a thorough search for low energy states in the neighborhood of a starting model using restraints to avoid straying too far. The approach was reasonably successful in improving both regions largely incorrect in the starting models as well as core regions that started out closer to the correct structure. Models with GDT-HA over 70 were obtained for five targets and for one of those, an accuracy of 0.5 å backbone root-mean-square deviation (RMSD) was achieved. An important current challenge is to improve performance in refining oligomers and larger proteins, for which the search problem remains extremely difficult.
Collapse
Affiliation(s)
- Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - Gyu Rie Lee
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - David E Kim
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - Qian Cong
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington
| |
Collapse
|