1
|
Ng CL, Lim TS, Choong YS. Application of Computational Techniques in Antibody Fc-Fused Molecule Design for Therapeutics. Mol Biotechnol 2024; 66:568-581. [PMID: 37742298 DOI: 10.1007/s12033-023-00885-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 08/23/2023] [Indexed: 09/26/2023]
Abstract
Since the advent of hybridoma technology in the year 1975, it took a decade to witness the first approved monoclonal antibody Orthoclone OKT39 (muromonab-CD3) in the year 1986. Since then, continuous strides have been made to engineer antibodies for specific desired effects. The engineering efforts were not confined to only the variable domains of the antibody but also included the fragment crystallizable (Fc) region that influences the immune response and serum half-life. Engineering of the Fc fragment would have a profound effect on the therapeutic dose, antibody-dependent cell-mediated cytotoxicity as well as antibody-dependent cellular phagocytosis. The integration of computational techniques into antibody engineering designs has allowed for the generation of testable hypotheses and guided the rational antibody design framework prior to further experimental evaluations. In this article, we discuss the recent works in the Fc-fused molecule design that involves computational techniques. We also summarize the usefulness of in silico techniques to aid Fc-fused molecule design and analysis for the therapeutics application.
Collapse
Affiliation(s)
- Chong Lee Ng
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Minden, Penang, Malaysia
| | - Theam Soon Lim
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Minden, Penang, Malaysia
| | - Yee Siew Choong
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Minden, Penang, Malaysia.
| |
Collapse
|
2
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
3
|
Santos MNM, Pintor KL, Hsieh PY, Cheung YW, Sung LK, Shih YL, Lai EM. Agrobacteria deploy two classes of His-Me finger superfamily nuclease effectors exerting different antibacterial capacities against specific bacterial competitors. Front Microbiol 2024; 15:1351590. [PMID: 38426053 PMCID: PMC10902643 DOI: 10.3389/fmicb.2024.1351590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 01/22/2024] [Indexed: 03/02/2024] Open
Abstract
The type VI secretion system (T6SS) assembles into a contractile nanomachine to inject effectors across bacterial membranes for secretion. The Agrobacterium tumefaciens species complex is a group of soil inhabitants and phytopathogens that deploys T6SS as an antibacterial weapon against bacterial competitors at both inter-species and intra-species levels. The A. tumefaciens strain 1D1609 genome encodes one main T6SS gene cluster and four vrgG genes (i.e., vgrGa-d), each encoding a spike protein as an effector carrier. A previous study reported that vgrGa-associated gene 2, named v2a, encodes a His-Me finger nuclease toxin (also named HNH/ENDO VII nuclease), contributing to DNase-mediated antibacterial activity. However, the functions and roles of other putative effectors remain unknown. In this study, we identified vgrGc-associated gene 2 (v2c) that encodes another His-Me finger nuclease but with a distinct Serine Histidine Histidine (SHH) motif that differs from the AHH motif of V2a. We demonstrated that the ectopic expression of V2c caused growth inhibition, plasmid DNA degradation, and cell elongation in Escherichia coli using DNAse activity assay and fluorescence microscopy. The cognate immunity protein, V3c, neutralizes the DNase activity and rescues the phenotypes of growth inhibition and cell elongation. Ectopic expression of V2c DNase-inactive variants retains the cell elongation phenotype, while V2a induces cell elongation in a DNase-mediated manner. We also showed that the amino acids of conserved SHH and HNH motifs are responsible for the V2c DNase activity in vivo and in vitro. Notably, V2c also mediated the DNA degradation and cell elongation of the target cell in the context of interbacterial competition. Importantly, V2a and V2c exhibit different capacities against different bacterial species and function synergistically to exert stronger antibacterial activity against the soft rot phytopathogen, Dickeya dadantii.
Collapse
Affiliation(s)
- Mary Nia M. Santos
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
- Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung-Hsing University and Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biotechnology, National Chung-Hsing University, Taichung, Taiwan
- Aquaculture Research and Development Division, Department of Agriculture-National Fisheries Research and Development Institute (DA-NFRDI), Manila, Philippines
| | | | - Pei-Yu Hsieh
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Yee-Wai Cheung
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Li-Kang Sung
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
- Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung-Hsing University and Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biotechnology, National Chung-Hsing University, Taichung, Taiwan
| | - Yu-Ling Shih
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Erh-Min Lai
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
- Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung-Hsing University and Academia Sinica, Taipei, Taiwan
- Biotechnology Center, National Chung-Hsing University, Taichung, Taiwan
| |
Collapse
|
4
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
5
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
6
|
Peslalz P, Kraus F, Izzo F, Bleisch A, El Hamdaoui Y, Schulz I, Kany AM, Hirsch AKH, Friedland K, Plietker B. Selective Activation of a TRPC6 Ion Channel Over TRPC3 by Metalated Type-B Polycyclic Polyprenylated Acylphloroglucinols. J Med Chem 2023; 66:15061-15072. [PMID: 37922400 DOI: 10.1021/acs.jmedchem.3c01170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2023]
Abstract
Selective modulation of TRPC6 ion channels is a promising therapeutic approach for neurodegenerative diseases and depression. A significant advancement showcases the selective activation of TRPC6 through metalated type-B PPAP, termed PPAP53. This success stems from PPAP53's 1,3-diketone motif facilitating metal coordination. PPAP53 is water-soluble and as potent as hyperforin, the gold standard in this field. In contrast to type-A, type-B PPAPs offer advantages such as gram-scale synthesis, easy derivatization, and long-term stability. Our investigations reveal PPAP53 selectively binding to the C-terminus of TRPC6. Although cryoelectron microscopy has resolved the majority of the TRPC6 structure, the binding site in the C-terminus remained unresolved. To address this issue, we employed state-of-the-art artificial-intelligence-based protein structure prediction algorithms to predict the missing region. Our computational results, validated against experimental data, indicate that PPAP53 binds to the 777LLKL780-region of the C-terminus, thus providing critical insights into the binding mechanism of PPAP53.
Collapse
Affiliation(s)
- Philipp Peslalz
- Chair of Organic Chemistry, Faculty of Chemistry and Food Chemistry, Technical University Dresden, Bergstr. 66, Dresden 01069, Germany
| | - Frank Kraus
- Institut für Organische Chemie, Universität Stuttgart , Pfaffenwaldring 55, Stuttgart 70569, Germany
| | - Flavia Izzo
- Institut für Organische Chemie, Universität Stuttgart , Pfaffenwaldring 55, Stuttgart 70569, Germany
| | - Anton Bleisch
- Chair of Organic Chemistry, Faculty of Chemistry and Food Chemistry, Technical University Dresden, Bergstr. 66, Dresden 01069, Germany
| | - Yamina El Hamdaoui
- Institut für Biomedizinische und Pharmazeutische Wissenschaften Johannes Gutenberg-Universität Mainz, Mainz 55128, Germany
| | - Ina Schulz
- Institut für Biomedizinische und Pharmazeutische Wissenschaften Johannes Gutenberg-Universität Mainz, Mainz 55128, Germany
| | - Andreas M Kany
- Helmholtz Institute for Pharm. Research Saarland (HIPS)-Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
| | - Anna K H Hirsch
- Helmholtz Institute for Pharm. Research Saarland (HIPS)-Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
- Department of Pharmacy, Saarland University, Saarbrücken 66123, Germany
| | - Kristina Friedland
- Institut für Biomedizinische und Pharmazeutische Wissenschaften Johannes Gutenberg-Universität Mainz, Mainz 55128, Germany
| | - Bernd Plietker
- Chair of Organic Chemistry, Faculty of Chemistry and Food Chemistry, Technical University Dresden, Bergstr. 66, Dresden 01069, Germany
- Institut für Organische Chemie, Universität Stuttgart , Pfaffenwaldring 55, Stuttgart 70569, Germany
| |
Collapse
|
7
|
Du K, Huang H. Development of anti-PD-L1 antibody based on structure prediction of AlphaFold2. Front Immunol 2023; 14:1275999. [PMID: 37942332 PMCID: PMC10628240 DOI: 10.3389/fimmu.2023.1275999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 10/11/2023] [Indexed: 11/10/2023] Open
Abstract
Accurate structural information plays a crucial role in comprehending biological processes and designing drugs. Indeed, the remarkable precision of the AlphaFold2 has facilitated significant advancements in predicting molecular structures, encompassing antibodies and antigens. This breakthrough has paved the way for rational drug design, ushering in new possibilities in the field of pharmaceutical development. Within this study, performing analysis and humanization guided by the structures predicted by AlphaFold2. Notably, the resulting humanized antibody, h3D5-hIgG1, demonstrated exceptional binding affinity to the PD-L1 protein. The KD value of parental antibody 3D5-hIgG1 was increased by nearly 7 times after humanization. Both h3D5-hIgG1 and 3D5-hIgG1 bound to cells expressing human PD-L1 with EC50 values of 5.13 and 9.92nM, respectively. Humanization resulted in a twofold increase in the binding capacity of the antibody, with h3D5-hIgG1 exhibiting superior performance compared to the parental antibody 3D5-hIgG1. Furthermore, h3D5-hIgG1 promoted cytokine secretion of T cells, and significantly suppressed MC38-hPD-L1 tumor growth. This study highlights the potential for artificial intelligence-assisted drug development, which is poised to become a prominent trend in the future.
Collapse
Affiliation(s)
- Kun Du
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China
| | - He Huang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China
| |
Collapse
|
8
|
Lategan FA, Schreiber C, Patterton HG. SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures. BMC Bioinformatics 2023; 24:373. [PMID: 37789284 PMCID: PMC10546711 DOI: 10.1186/s12859-023-05498-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/25/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND The relationship between the sequence of a protein, its structure, and the resulting connection between its structure and function, is a foundational principle in biological science. Only recently has the computational prediction of protein structure based only on protein sequence been addressed effectively by AlphaFold, a neural network approach that can predict the majority of protein structures with X-ray crystallographic accuracy. A question that is now of acute relevance is the "inverse protein folding problem": predicting the sequence of a protein that folds into a specified structure. This will be of immense value in protein engineering and biotechnology, and will allow the design and expression of recombinant proteins that can, for instance, fold into specified structures as a scaffold for the attachment of recombinant antigens, or enzymes with modified or novel catalytic activities. Here we describe the development of SeqPredNN, a feed-forward neural network trained with X-ray crystallographic structures from the RCSB Protein Data Bank to predict the identity of amino acids in a protein structure using only the relative positions, orientations, and backbone dihedral angles of nearby residues. RESULTS We predict the sequence of a protein expected to fold into a specified structure and assess the accuracy of the prediction using both AlphaFold and RoseTTAFold to computationally generate the fold of the derived sequence. We show that the sequences predicted by SeqPredNN fold into a structure with a median TM-score of 0.638 when compared to the crystal structure according to AlphaFold predictions, yet these sequences are unique and only 28.4% identical to the sequence of the crystallized protein. CONCLUSIONS We propose that SeqPredNN will be a valuable tool to generate proteins of defined structure for the design of novel biomaterials, pharmaceuticals, catalysts, and reporter systems. The low sequence identity of its predictions compared to the native sequence could prove useful for developing proteins with modified physical properties, such as water solubility and thermal stability. The speed and ease of use of SeqPredNN offers a significant advantage over physics-based protein design methods.
Collapse
Affiliation(s)
- F Adriaan Lategan
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Caroline Schreiber
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Hugh G Patterton
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa.
| |
Collapse
|
9
|
Ramans-Harborough S, Kalverda AP, Manfield IW, Thompson GS, Kieffer M, Uzunova V, Quareshy M, Prusinska JM, Roychoudhry S, Hayashi KI, Napier R, del Genio C, Kepinski S. Intrinsic disorder and conformational coexistence in auxin coreceptors. Proc Natl Acad Sci U S A 2023; 120:e2221286120. [PMID: 37756337 PMCID: PMC10556615 DOI: 10.1073/pnas.2221286120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 07/17/2023] [Indexed: 09/29/2023] Open
Abstract
AUXIN/INDOLE 3-ACETIC ACID (Aux/IAA) transcriptional repressor proteins and the TRANSPORT INHIBITOR RESISTANT 1/AUXIN SIGNALING F-BOX (TIR1/AFB) proteins to which they bind act as auxin coreceptors. While the structure of TIR1 has been solved, structural characterization of the regions of the Aux/IAA protein responsible for auxin perception has been complicated by their predicted disorder. Here, we use NMR, CD and molecular dynamics simulation to investigate the N-terminal domains of the Aux/IAA protein IAA17/AXR3. We show that despite the conformational flexibility of the region, a critical W-P bond in the core of the Aux/IAA degron motif occurs at a strikingly high (1:1) ratio of cis to trans isomers, consistent with the requirement of the cis conformer for the formation of the fully-docked receptor complex. We show that the N-terminal half of AXR3 is a mixture of multiple transiently structured conformations with a propensity for two predominant and distinct conformational subpopulations within the overall ensemble. These two states were modeled together with the C-terminal PB1 domain to provide the first complete simulation of an Aux/IAA. Using MD to recreate the assembly of each complex in the presence of auxin, both structural arrangements were shown to engage with the TIR1 receptor, and contact maps from the simulations match closely observations of NMR signal-decreases. Together, our results and approach provide a platform for exploring the functional significance of variation in the Aux/IAA coreceptor family and for understanding the role of intrinsic disorder in auxin signal transduction and other signaling systems.
Collapse
Affiliation(s)
- Sigurd Ramans-Harborough
- School of Biology, Faculty of Biological Sciences, University of Leeds, LeedsLS2 9JT, United Kingdom
| | - Arnout P. Kalverda
- Astbury Centre for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, LeedsLS2 9JT, United Kingdom
| | - Iain W. Manfield
- Astbury Centre for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, LeedsLS2 9JT, United Kingdom
| | - Gary S. Thompson
- Wellcome Biological Nuclear Magnetic Resonance Facility, Division of Natural Sciences, University of Kent, CanterburyCT2 7NJ, United Kingdom
| | - Martin Kieffer
- School of Biology, Faculty of Biological Sciences, University of Leeds, LeedsLS2 9JT, United Kingdom
| | - Veselina Uzunova
- School of Life Sciences, University of Warwick, CoventryCV4 7AL, United Kingdom
| | - Mussa Quareshy
- School of Life Sciences, University of Warwick, CoventryCV4 7AL, United Kingdom
| | | | - Suruchi Roychoudhry
- School of Biology, Faculty of Biological Sciences, University of Leeds, LeedsLS2 9JT, United Kingdom
| | - Ken-ichiro Hayashi
- Department of Bioscience, Okayama University of Science, Okayama700-0005, Japan
| | - Richard Napier
- School of Life Sciences, University of Warwick, CoventryCV4 7AL, United Kingdom
| | - Charo del Genio
- Centre for Fluid and Complex Systems, Coventry University, CoventryCV1 5FB, United Kingdom
| | - Stefan Kepinski
- School of Biology, Faculty of Biological Sciences, University of Leeds, LeedsLS2 9JT, United Kingdom
| |
Collapse
|
10
|
De Jesus DF, Kimura T, Gupta MK, Kulkarni RN. NREP contributes to development of NAFLD by regulating one-carbon metabolism in primary human hepatocytes. Cell Chem Biol 2023; 30:1144-1155.e4. [PMID: 37354909 PMCID: PMC10529627 DOI: 10.1016/j.chembiol.2023.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/06/2023] [Accepted: 06/01/2023] [Indexed: 06/26/2023]
Abstract
Non-alcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease. We recently discovered that neuronal regeneration-related protein (NREP/P311), an epigenetically regulated gene reprogrammed by parental metabolic syndrome, is downregulated in human NAFLD. To investigate the impact of NREP insufficiency, we used RNA-sequencing, lipidomics, and antibody microarrays on primary human hepatocytes. NREP knockdown induced transcriptomic remodeling that overlapped with key pathways impacted in human steatosis and steatohepatitis. Additionally, we observed enrichment of pathways involving phosphatidylinositol signaling and one-carbon metabolism. Lipidomics analyses also revealed an increase in cholesterol esters and triglycerides and decreased phosphatidylcholine levels in NREP-deficient hepatocytes. Signalomics identified calcium signaling as a potential mediator of NREP insufficiency's effects. Our results, together with the encouraging observation that several single nucleotide polymorphisms (SNPs) spanning the NREP locus are associated with metabolic traits, provide a strong rationale for targeting hepatic NREP to improve NAFLD pathophysiology.
Collapse
Affiliation(s)
- Dario F De Jesus
- Islet Cell and Regenerative Biology, Joslin Diabetes Center, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Stem Cell Institute, and Harvard Medical School, Boston, MA, USA
| | - Tomohiko Kimura
- Islet Cell and Regenerative Biology, Joslin Diabetes Center, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Stem Cell Institute, and Harvard Medical School, Boston, MA, USA
| | - Manoj K Gupta
- Islet Cell and Regenerative Biology, Joslin Diabetes Center, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Stem Cell Institute, and Harvard Medical School, Boston, MA, USA
| | - Rohit N Kulkarni
- Islet Cell and Regenerative Biology, Joslin Diabetes Center, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Stem Cell Institute, and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
11
|
Hamamsy T, Morton JT, Blackwell R, Berenberg D, Carriero N, Gligorijevic V, Strauss CEM, Leman JK, Cho K, Bonneau R. Protein remote homology detection and structural alignment using deep learning. Nat Biotechnol 2023:10.1038/s41587-023-01917-2. [PMID: 37679542 DOI: 10.1038/s41587-023-01917-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 07/26/2023] [Indexed: 09/09/2023]
Abstract
Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
Collapse
Affiliation(s)
- Tymor Hamamsy
- Center for Data Science, New York University, New York, NY, USA
| | - James T Morton
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Robert Blackwell
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Prescient Design, New York, NY, USA
| | - Nicholas Carriero
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | | | | | - Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, USA.
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
- Prescient Design, New York, NY, USA.
- CIFAR, Toronto, Ontario, Canada.
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, USA.
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
- Prescient Design, New York, NY, USA.
- Department of Biology, New York University, New York, NY, USA.
| |
Collapse
|
12
|
Kandathil SM, Lau AM, Jones DT. Machine learning methods for predicting protein structure from single sequences. Curr Opin Struct Biol 2023; 81:102627. [PMID: 37320955 DOI: 10.1016/j.sbi.2023.102627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/17/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023]
Abstract
Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which presents many advantages. Although most techniques of this type make use of multiple sequence alignments as their primary input, a new wave of methods have attempted to use just single sequences as the input. We discuss the make-up and operating principles of these models, and highlight new developments in these areas, as well as areas for future development.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Andy M Lau
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
13
|
Watanabe N, Kuriya Y, Murata M, Yamamoto M, Shimizu M, Araki M. Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD. Biology (Basel) 2023; 12:795. [PMID: 37372080 DOI: 10.3390/biology12060795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/17/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023]
Abstract
The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.
Collapse
Affiliation(s)
- Naoki Watanabe
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Yuki Kuriya
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Masahiro Murata
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-Ku, Kobe 657-8501, Japan
| | - Masaki Yamamoto
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
| | - Masayuki Shimizu
- Bacchus Bio Innovation Co., Ltd., 6-3-7 Minatojima minami-machi, Kobe 650-0047, Japan
| | - Michihiro Araki
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Settsu 566-0002, Japan
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-Ku, Kobe 657-8501, Japan
- Graduate School of Medicine, Kyoto University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
- National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita 564-8565, Japan
| |
Collapse
|
14
|
Koehler Leman J, Szczerbiak P, Renfrew PD, Gligorijevic V, Berenberg D, Vatanen T, Taylor BC, Chandler C, Janssen S, Pataki A, Carriero N, Fisk I, Xavier RJ, Knight R, Bonneau R, Kosciolek T. Sequence-structure-function relationships in the microbial protein universe. Nat Commun 2023; 14:2351. [PMID: 37100781 PMCID: PMC10133388 DOI: 10.1038/s41467-023-37896-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2023] Open
Abstract
For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Biology, New York University, New York, NY, USA.
| | - Pawel Szczerbiak
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - P Douglas Renfrew
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - Vladimir Gligorijevic
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| | - Daniel Berenberg
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
- Center for Data Science, New York University, New York, NY, 10011, USA
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, USA
| | - Tommi Vatanen
- Broad Institute, Cambridge, MA, USA
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, 00014 University of Helsinki, Helsinki, Finland
| | - Bryn C Taylor
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- In Silico Discovery and External Innovation, Janssen Research and Development, San Diego, CA, 92122, USA
| | - Chris Chandler
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Stefan Janssen
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA
- Algorithmic Bioinformatics, Justus Liebig University Giessen, Giessen, Germany
| | - Andras Pataki
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Nick Carriero
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Ian Fisk
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Ramnik J Xavier
- Broad Institute, Cambridge, MA, USA
- Center for Microbiome Informatics and Therapeutics, MIT, Cambridge, MA, 02139, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
- Center for Data Science, New York University, New York, NY, 10011, USA
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| | - Tomasz Kosciolek
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
| |
Collapse
|
15
|
Bordin N, Sillitoe I, Nallapareddy V, Rauer C, Lam SD, Waman VP, Sen N, Heinzinger M, Littmann M, Kim S, Velankar S, Steinegger M, Rost B, Orengo C. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
16
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
17
|
Wu F, Jing X, Luo X, Xu J. Improving protein structure prediction using templates and sequence embedding. Bioinformatics 2023; 39:6820926. [PMID: 36355462 PMCID: PMC9805584 DOI: 10.1093/bioinformatics/btac723] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 10/17/2022] [Accepted: 11/09/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction. RESULTS We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore > 0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively. AVAILABILITY AND IMPLEMENTATION Available at https://github.com/xluo233/RaptorXFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Xiao Luo
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- To whom correspondence should be addressed.
| |
Collapse
|
18
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
19
|
Peng Z, Wang W, Han R, Zhang F, Yang J. Protein structure prediction in the deep learning era. Curr Opin Struct Biol 2022; 77:102495. [PMID: 36371845 DOI: 10.1016/j.sbi.2022.102495] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/03/2022] [Accepted: 10/04/2022] [Indexed: 11/11/2022]
Abstract
Significant advances have been achieved in protein structure prediction, especially with the recent development of the AlphaFold2 and the RoseTTAFold systems. This article reviews the progress in deep learning-based protein structure prediction methods in the past two years. First, we divide the representative methods into two categories: the two-step approach and the end-to-end approach. Then, we show that the two-step approach is possible to achieve similar accuracy to the state-of-the-art end-to-end approach AlphaFold2. Compared to the end-to-end approach, the two-step approach requires fewer computing resources. We conclude that it is valuable to keep developing both approaches. Finally, a few outstanding challenges in function-orientated protein structure prediction are pointed out for future development.
Collapse
|
20
|
Nyaribo CM, Ng'ong'a FA, Nyanjom SG. IN SILICO INVESTIGATION OF ACYCLOVIR DERIVATIVES POTENCY AGAINST HERPES SIMPLEX VIRUS. Scientific African 2022. [DOI: 10.1016/j.sciaf.2022.e01461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
21
|
Dade CM, Douzi B, Cambillau C, Ball G, Voulhoux R, Forest KT. The crystal structure of CbpD clarifies substrate-specificity motifs in chitin-active lytic polysaccharide monooxygenases. Acta Crystallogr D Struct Biol 2022; 78:1064-1078. [PMID: 35916229 PMCID: PMC9344471 DOI: 10.1107/s2059798322007033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 07/08/2022] [Indexed: 11/23/2022] Open
Abstract
The 3 Å resolution crystal structure of the Pseudomonas aeruginosa virulence factor CbpD both supports and challenges the current model of how lytic polysaccharide monooxygenases bind chitin and raises interesting possibilities about how type 2 secretion-system substrates may interact with the secretion machinery. This structure also demonstrates the utility of new, AI-powered, protein structure-prediction algorithms in making challenging structural targets tractable. Pseudomonas aeruginosa secretes diverse proteins via its type 2 secretion system, including a 39 kDa chitin-binding protein, CbpD. CbpD has recently been shown to be a lytic polysaccharide monooxygenase active on chitin and to contribute substantially to virulence. To date, no structure of this virulence factor has been reported. Its first two domains are homologous to those found in the crystal structure of Vibrio cholerae GbpA, while the third domain is homologous to the NMR structure of the CBM73 domain of Cellvibrio japonicusCjLPMO10A. Here, the 3.0 Å resolution crystal structure of CbpD solved by molecular replacement is reported, which required ab initio models of each CbpD domain generated by the artificial intelligence deep-learning structure-prediction algorithm RoseTTAFold. The structure of CbpD confirms some previously reported substrate-specificity motifs among LPMOAA10s, while challenging the predictive power of others. Additionally, the structure of CbpD shows that post-translational modifications occur on the chitin-binding surface. Moreover, the structure raises interesting possibilities about how type 2 secretion-system substrates may interact with the secretion machinery and demonstrates the utility of new artificial intelligence protein structure-prediction algorithms in making challenging structural targets tractable.
Collapse
|
22
|
Tomaž Š, Gruden K, Coll A. TGA transcription factors-Structural characteristics as basis for functional variability. Front Plant Sci 2022; 13:935819. [PMID: 35958211 PMCID: PMC9360754 DOI: 10.3389/fpls.2022.935819] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 07/04/2022] [Indexed: 06/15/2023]
Abstract
TGA transcription factors are essential regulators of various cellular processes, their activity connected to different hormonal pathways, interacting proteins and regulatory elements. Belonging to the basic region leucine zipper (bZIP) family, TGAs operate by binding to their target DNA sequence as dimers through a conserved bZIP domain. Despite sharing the core DNA-binding sequence, the TGA paralogues exert somewhat different DNA-binding preferences. Sequence variability of their N- and C-terminal protein parts indicates their importance in defining TGA functional specificity through interactions with diverse proteins, affecting their DNA-binding properties. In this review, we provide a short and concise summary on plant TGA transcription factors from a structural point of view, including the relation of their structural characteristics to their functional roles in transcription regulation.
Collapse
Affiliation(s)
- Špela Tomaž
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Anna Coll
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| |
Collapse
|
23
|
Sircar G, Ghosh N, Saha S. Designing Next-Generation Vaccines Against Common Pan-Allergens Using In Silico Approaches. Monoclon Antib Immunodiagn Immunother 2022; 41:231-242. [PMID: 35852870 DOI: 10.1089/mab.2021.0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Next-generation allergy vaccines refer to allergen-derived attenuated molecules that can boost allergen-blocking IgG response. These IgG antibodies are specifically directed toward the IgE epitope of allergens and interfere in allergen-IgE interaction. Our study is a computational approach to design such vaccines against four widespread pan-allergens families. Pan-allergens display extensive immunological cross-reactivity due to the presence of conserved IgE epitope and T cell epitope. In this study, the vaccine design is based on hapten-carrier concept in which the carrier protein is an immunogenic component providing T cell help. Either PreS protein of hepatitis B or cholera enterotoxin B (CTB) fused with three tetanus toxoid fragments (TTFrC) was used here as the carrier. The hapten components are nonanaphylactic peptides (NAPs) derived from experimentally determined antigenic regions of the allergens. The charged residues of NAPs are selectively modified to obliterate IgE, as well as T cell reaction, and hence, are safe to apply in allergy patients. Various combinations of vaccine constructs (PreS/CTB+TTFrC and NAPs) were designed with intermediate linker motifs. Screening of constructs was performed through a three-step method such as physicochemical parameters, secondary structures, and tertiary structures using various bioinformatic tools. The final construct with best quality and stability was selected for each allergen family. Suitability of these constructs for being expressed in recombinant form was checked at DNA, RNA, and protein level. Presence of putative epitopes inducing tolerogenic interleukin-10 was also predicted for these constructs. The present work led to the design of putative vaccines with immunotherapeutic potential and broad applicability for allergic diseases caused by a wide array of cross-reactive allergens.
Collapse
Affiliation(s)
- Gaurab Sircar
- Department of Botany, Visva-Bharati, Santiniketan, India
| | - Nandini Ghosh
- Department of Microbiology, Vidyasagar University, Paschim Medinipur, India
| | - Sudipto Saha
- Division of Bioinformatics, Bose Institute (Centenary Building), Kolkata, India
| |
Collapse
|
24
|
Kagaya Y, Flannery ST, Jain A, Kihara D. ContactPFP: Protein Function Prediction Using Predicted Contact Information. Front Bioinform 2022; 2. [PMID: 35875419 PMCID: PMC9302406 DOI: 10.3389/fbinf.2022.896295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
Collapse
Affiliation(s)
- Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Sean T. Flannery
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- *Correspondence: Daisuke Kihara,
| |
Collapse
|
25
|
Sen N, Anishchenko I, Bordin N, Sillitoe I, Velankar S, Baker D, Orengo C. Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs. Brief Bioinform 2022; 23:6596316. [PMID: 35641150 PMCID: PMC9294430 DOI: 10.1093/bib/bbac187] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 12/12/2022] Open
Abstract
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein–protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
Collapse
Affiliation(s)
- Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| |
Collapse
|
26
|
Zhang H, Huang Y, Bei Z, Ju Z, Meng J, Hao M, Zhang J, Zhang H, Xi W. Inter-Residue Distance Prediction From Duet Deep Learning Models. Front Genet 2022; 13:887491. [PMID: 35651930 PMCID: PMC9148999 DOI: 10.3389/fgene.2022.887491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 03/30/2022] [Indexed: 12/04/2022] Open
Abstract
Residue distance prediction from the sequence is critical for many biological applications such as protein structure reconstruction, protein–protein interaction prediction, and protein design. However, prediction of fine-grained distances between residues with long sequence separations still remains challenging. In this study, we propose DuetDis, a method based on duet feature sets and deep residual network with squeeze-and-excitation (SE), for protein inter-residue distance prediction. DuetDis embraces the ability to learn and fuse features directly or indirectly extracted from the whole-genome/metagenomic databases and, therefore, minimize the information loss through ensembling models trained on different feature sets. We evaluate DuetDis and 11 widely used peer methods on a large-scale test set (610 proteins chains). The experimental results suggest that 1) prediction results from different feature sets show obvious differences; 2) ensembling different feature sets can improve the prediction performance; 3) high-quality multiple sequence alignment (MSA) used for both training and testing can greatly improve the prediction performance; and 4) DuetDis is more accurate than peer methods for the overall prediction, more reliable in terms of model prediction score, and more robust against shallow multiple sequence alignment (MSA).
Collapse
Affiliation(s)
- Huiling Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ying Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhendong Bei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen Ju
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Jingjing Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haiping Zhang
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenhui Xi
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Wenhui Xi,
| |
Collapse
|
27
|
Zhou X, Peng C, Zheng W, Li Y, Zhang G, Zhang Y. DEMO2: Assemble multi-domain protein structures by coupling analogous template alignments with deep-learning inter-domain restraint prediction. Nucleic Acids Res 2022; 50:W235-W245. [PMID: 35536281 PMCID: PMC9252800 DOI: 10.1093/nar/gkac340] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/13/2022] [Accepted: 04/22/2022] [Indexed: 01/19/2023] Open
Abstract
Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at https://zhanggroup.org/DEMO/.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Chunxiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
28
|
Mateen RM, Tariq A, Afzal MS, Ali M, Tipu I, Hussain M, Saleem M, Naveed M. TULP3 NLS inhibition: an in silico study to hamper cargo transport to nucleus. J Biomol Struct Dyn 2022:1-9. [PMID: 35510584 DOI: 10.1080/07391102.2022.2070283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
TULP3 is involved in cell regulation pathways including transcription and signal transduction. In some pathological states like in cancers, increased level of TULP3 has been observed so it can serve as a potential target to hamper the activation of those pathways. We propose a novel idea of inhibiting nuclear localization signal (NLS) to interrupt nuclear translocation of TULP3 so that the downstream activations of pathways are blocked. In current in silico study, 3D structure of TULP3 was modeled using 8 different tools including I-TASSER, CABS-FOLD, Phyre2, PSIPRED, RaptorX, Robetta, Rosetta and Prime by Schrödinger. Best structure was selected after quality evaluation by SAVES and implied for the investigation of NLS sequence. Mapped NLS sequence was further used to dock with natural ligand importin-α as control docking to validate the NLS sequence as binding site. After docking and molecular dynamics (MD) simulation validation, these residues were used as binding side for subsequent docking studies. 70 alkaloids were selected after intensive literature survey and were virtually docked with NLS sequence where natural ligand importin-α is supposed to be bound. This study demonstrates the virtual inhibition of NLS sequence so that it paves a way for future in-vivo studies to use NLS as a new drug target for cancer therapeutics.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rana Muhammad Mateen
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Asma Tariq
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Muhammad Sohail Afzal
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Muhammad Ali
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Imran Tipu
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Mureed Hussain
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Mahjabeen Saleem
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Muhammad Naveed
- Department of Life Sciences, University of Central Punjab, Lahore, Pakistan
| |
Collapse
|
29
|
Darmawan KK, Karagiannis TC, Hughes JG, Small DM, Hung A. Molecular modeling of lactoferrin for food and nutraceutical applications: insights from in silico techniques. Crit Rev Food Sci Nutr 2022; 63:9074-9097. [PMID: 35503258 DOI: 10.1080/10408398.2022.2067824] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Lactoferrin is a protein, primarily found in milk that has attracted the interest of the food industries due to its health properties. Nevertheless, the instability of lactoferrin has limited its commercial application. Recent studies have focused on encapsulation to enhance the stability of lactoferrin. However, the molecular insights underlying the changes of structural properties of lactoferrin and the interaction with protectants remain poorly understood. Computational approaches have proven useful in understanding the structural properties of molecules and the key binding with other constituents. In this review, comprehensive information on the structure and function of lactoferrin and the binding with various molecules for food purposes are reviewed, with a special emphasis on the use of molecular dynamics simulations. The results demonstrate the application of modeling and simulations to determine key residues of lactoferrin responsible for its stability and interactions with other biomolecular components under various conditions, which are also associated with its functional benefits. These have also been extended into the potential creation of enhanced lactoferrin for commercial purposes. This review provides valuable strategies in designing novel nutraceuticals for food science practitioners and those who have interests in acquiring familiarity with the application of computational modeling for food and health purposes.
Collapse
Affiliation(s)
- Kevion K Darmawan
- School of Science, STEM College, RMIT University, Melbourne, Australia
| | - Tom C Karagiannis
- Epigenomic Medicine, Department of Diabetes, Central Clinical School, Monash University, Melbourne, Australia
- Department of Clinical Pathology, The University of Melbourne, Melbourne, Australia
| | - Jeff G Hughes
- School of Science, STEM College, RMIT University, Melbourne, Australia
| | - Darryl M Small
- School of Science, STEM College, RMIT University, Melbourne, Australia
| | - Andrew Hung
- School of Science, STEM College, RMIT University, Melbourne, Australia
| |
Collapse
|
30
|
Naveed M, Ali U, Karobari MI, Ahmed N, Mohamed RN, Abullais SS, Kader MA, Marya A, Messina P, Scardina GA. A Vaccine Construction against COVID-19-Associated Mucormycosis Contrived with Immunoinformatics-Based Scavenging of Potential Mucoralean Epitopes. Vaccines (Basel) 2022; 10. [PMID: 35632420 DOI: 10.3390/vaccines10050664] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/16/2022] [Accepted: 04/19/2022] [Indexed: 01/09/2023] Open
Abstract
Mucormycosis is a group of infections, caused by multiple fungal species, which affect many human organs and is lethal in immunocompromised patients. During the COVID-19 pandemic, the current wave of mucormycosis is a challenge to medical professionals as its effects are multiplied because of the severity of COVID-19 infection. The variant of concern, Omicron, has been linked to fatal mucormycosis infections in the US and Asia. Consequently, current postdiagnostic treatments of mucormycosis have been rendered unsatisfactory. In this hour of need, a preinfection cure is needed that may prevent lethal infections in immunocompromised individuals. This study proposes a potential vaccine construct targeting mucor and rhizopus species responsible for mucormycosis infections, providing immunoprotection to immunocompromised patients. The vaccine construct, with an antigenicity score of 0.75 covering, on average, 92-98% of the world population, was designed using an immunoinformatics approach. Molecular interactions with major histocompatibility complex-1 (MHC-I), Toll-like receptors-2 (TLR2), and glucose-regulated protein 78 (GRP78), with scores of -896.0, -948.4, and -925.0, respectively, demonstrated its potential to bind with the human immune receptors. It elicited a strong predicted innate and adaptive immune response in the form of helper T (Th) cells, cytotoxic T (TC) cells, B cells, natural killer (NK) cells, and macrophages. The vaccine cloned in the pBR322 vector showed positive amplification, further solidifying its stability and potential. The proposed construct holds a promising approach as the first step towards an antimucormycosis vaccine and may contribute to minimizing postdiagnostic burdens and failures.
Collapse
|
31
|
Feng SH, Xia CQ, Zhang PD, Shen HB. Ab-Initio Membrane Protein Amphipathic Helix Structure Prediction Using Deep Neural Networks. IEEE/ACM Trans Comput Biol Bioinform 2022; 19:795-805. [PMID: 33026978 DOI: 10.1109/tcbb.2020.3029274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Amphipathic helix (AH)features the segregation of polar and nonpolar residues and plays important roles in many membrane-associated biological processes through interacting with both the lipid and the soluble phases. Although the AH structure has been discovered for a long time, few ab initio machine learning-based prediction models have been reported, due to the limited amount of training data. In this study, we report a new deep learning-based prediction model, which is composed of a residual neural network and the uneven-thresholds decision algorithm. It is constructed on 121 membrane proteins, in total 51640 residue samples, which are curated from an up-to-date membrane protein structure database. Through a rigid 10-fold nested cross-validation experiment, we demonstrate that our model can achieve promising predictions and exceed current state-of-the-art approaches in this field. This presents a new avenue for accurately predicting AHs. Analysis on the contribution of the input residues and some cases further reveals the high interpretability and the generalization of our model.
Collapse
|
32
|
Ruffolo JA, Sulam J, Gray JJ. Antibody structure prediction using interpretable deep learning. Patterns (N Y) 2022; 3:100406. [PMID: 35199061 DOI: 10.1016/j.patter.2021.100406] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/03/2021] [Accepted: 11/15/2021] [Indexed: 12/12/2022]
Abstract
Therapeutic antibodies make up a rapidly growing segment of the biologics market. However, rational design of antibodies is hindered by reliance on experimental methods for determining antibody structures. Here, we present DeepAb, a deep learning method for predicting accurate antibody FV structures from sequence. We evaluate DeepAb on a set of structurally diverse, therapeutically relevant antibodies and find that our method consistently outperforms the leading alternatives. Previous deep learning methods have operated as “black boxes” and offered few insights into their predictions. By introducing a directly interpretable attention mechanism, we show our network attends to physically important residue pairs (e.g., proximal aromatics and key hydrogen bonding interactions). Finally, we present a novel mutant scoring metric derived from network confidence and show that for a particular antibody, all eight of the top-ranked mutations improve binding affinity. This model will be useful for a broad range of antibody prediction and design tasks. DeepAb, a deep learning method for antibody structure, is presented Structures from DeepAb are more accurate than alternatives Outputs of DeepAb provide interpretable insights into structure predictions DeepAb predictions should facilitate design of novel antibody therapeutics
Accurate structure models are critical for understanding the properties of potential therapeutic antibodies. Conventional methods for protein structure determination require significant investments of time and resources and may fail. Although greatly improved, methods for general protein structure prediction still cannot consistently provide the accuracy necessary to understand or design antibodies. We present a deep learning method for antibody structure prediction and demonstrate improvement over alternatives on diverse, therapeutically relevant benchmarks. In addition to its improved accuracy, our method reveals interpretable outputs about specific amino acids and residue interactions that should facilitate design of novel therapeutic antibodies.
Collapse
|
33
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
34
|
Du Z, Peng Z, Yang J. Toward the assessment of predicted inter-residue distance. Bioinformatics 2022; 38:962-969. [PMID: 34791040 DOI: 10.1093/bioinformatics/btab781] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 10/07/2021] [Accepted: 11/10/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Significant progress has been achieved in distance-based protein folding, due to improved prediction of inter-residue distance by deep learning. Many efforts are thus made to improve distance prediction in recent years. However, it remains unknown what is the best way of objectively assessing the accuracy of predicted distance. RESULTS A total of 19 metrics were proposed to measure the accuracy of predicted distance. These metrics were discussed and compared quantitatively on three benchmark datasets, with distance and structure models predicted by the trRosetta pipeline. The experiments show that a few metrics, such as distance precision, have a high correlation with the model accuracy measure TM-score (Pearson's correlation coefficient >0.7). In addition, the metrics are applied to rank the distance prediction groups in CASP14. The ranking by our metrics coincides largely with the official version. These data suggest that the proposed metrics are effective for measuring distance prediction. We anticipate that this study paves the way for objectively monitoring the progress of inter-residue distance prediction. A web server and a standalone package are provided to implement the proposed metrics. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/APD. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
35
|
Kandathil SM, Greener JG, Lau AM, Jones DT. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proc Natl Acad Sci U S A 2022; 119:e2113348119. [PMID: 35074909 DOI: 10.1073/pnas.2113348119] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale three-dimensional modeling of proteins on minimal hardware, and we demonstrate this by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper.
Collapse
|
36
|
Abstract
Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human-AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI's potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide.
Collapse
Affiliation(s)
- Pranav Rajpurkar
- Department of Biomedical Informatics, Harvard University, Cambridge, MA, USA
| | - Emma Chen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Oishi Banerjee
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Eric J Topol
- Scripps Translational Science Institute, San Diego, CA, USA.
| |
Collapse
|
37
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
38
|
Fanelli A, Sullivan ML. Tools for protein structure prediction and for molecular docking applied to enzyme active site analysis: A case study using a BAHD hydroxycinnamoyltransferase. Methods Enzymol 2022; 683:41-79. [PMID: 37087195 DOI: 10.1016/bs.mie.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Elucidating the structure of an enzyme and how substrates bind to the active site is an important step for understanding its reaction mechanism and function. Nevertheless, the methods available to obtain three-dimensional structures of proteins, such as x-ray crystallography and NMR, can be expensive and time-consuming. Considering this, an alternative is using structural bioinformatic tools to predict the tertiary structure of a protein from its primary sequence, followed by molecular docking of one or more substrates into the enzyme structure model. In the past few years, significant advances have been made in these computational tools, which can give useful information about the active site and enzyme-substrate interactions before the structure can be resolved using physical methods. Here, using common bean (Phaseolus vulgaris) hydroxycinnamoyl-coenzyme A:tetrahydroxyhexanedioic acid hydroxycinnamoyltransferase (HHHT) as an example, we describe methods and workflows for protein structure prediction and molecular docking that can be performed on a personal computer using only open-source tools.
Collapse
Affiliation(s)
- Amanda Fanelli
- US Dairy Forage Research Center, USDA Agricultural Research Service, Madison, WI, United States.
| | - Michael L Sullivan
- US Dairy Forage Research Center, USDA Agricultural Research Service, Madison, WI, United States
| |
Collapse
|
39
|
Cui Z, Zhang S, Zhang S, Chen B, Zhu Y, Tan T. Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022; 41:6-21. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
40
|
Abstract
Summary Motivation. Predicting the native state of a protein has long been considered a gateway problem for understanding protein folding. Recent advances in structural modeling driven by deep learning have achieved unprecedented success at predicting a protein’s crystal structure, but it is not clear if these models are learning the physics of how proteins dynamically fold into their equilibrium structure or are just accurate knowledge-based predictors of the final state. Results. In this work, we compare the pathways generated by state-of-the-art protein structure prediction methods to experimental data about protein folding pathways. The methods considered were AlphaFold 2, RoseTTAFold, trRosetta, RaptorX, DMPfold, EVfold, SAINT2 and Rosetta. We find evidence that their simulated dynamics capture some information about the folding pathway, but their predictive ability is worse than a trivial classifier using sequence-agnostic features like chain length. The folding trajectories produced are also uncorrelated with experimental observables such as intermediate structures and the folding rate constant. These results suggest that recent advances in structure prediction do not yet provide an enhanced understanding of protein folding. Availability. The data underlying this article are available in GitHub at https://github.com/oxpig/structure-vs-folding/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos Outeiral
- Department of Statistics, University of Oxford, Oxford OX1 3PB, UK
| | - Daniel A Nissley
- Department of Statistics, University of Oxford, Oxford OX1 3PB, UK
| | | |
Collapse
|
41
|
Schwarz D, Georges G, Kelm S, Shi J, Vangone A, Deane CM. Co-evolutionary distance predictions contain flexibility information. Bioinformatics 2021; 38:65-72. [PMID: 34383892 DOI: 10.1093/bioinformatics/btab562] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 06/19/2021] [Accepted: 08/10/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Co-evolution analysis can be used to accurately predict residue-residue contacts from multiple sequence alignments. The introduction of machine-learning techniques has enabled substantial improvements in precision and a shift from predicting binary contacts to predict distances between pairs of residues. These developments have significantly improved the accuracy of de novo prediction of static protein structures. With AlphaFold2 lifting the accuracy of some predicted protein models close to experimental levels, structure prediction research will move on to other challenges. One of those areas is the prediction of more than one conformation of a protein. Here, we examine the potential of residue-residue distance predictions to be informative of protein flexibility rather than simply static structure. RESULTS We used DMPfold to predict distance distributions for every residue pair in a set of proteins that showed both rigid and flexible behaviour. Residue pairs that were in contact in at least one reference structure were classified as rigid, flexible or neither. The predicted distance distribution of each residue pair was analysed for local maxima of probability indicating the most likely distance or distances between a pair of residues. We found that rigid residue pairs tended to have only a single local maximum in their predicted distance distributions while flexible residue pairs more often had multiple local maxima. These results suggest that the shape of predicted distance distributions contains information on the rigidity or flexibility of a protein and its constituent residues. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dominik Schwarz
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Guy Georges
- Department of Computational Engineering and Data Science, Large Molecule Research, Penzberg 82377, Germany
| | - Sebastian Kelm
- Computer-Aided Drug Design, UCB Pharma, Slough SL1 3WE, UK
| | - Jiye Shi
- Computer-Aided Drug Design, UCB Pharma, Slough SL1 3WE, UK
| | - Anna Vangone
- Department of Computational Engineering and Data Science, Large Molecule Research, Penzberg 82377, Germany
| | | |
Collapse
|
42
|
Liu J, Zhao KL, He GX, Wang LJ, Zhou XG, Zhang GJ. A de novo protein structure prediction by iterative partition sampling, topology adjustment and residue-level distance deviation optimization. Bioinformatics 2021; 38:99-107. [PMID: 34459867 DOI: 10.1093/bioinformatics/btab620] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 07/23/2021] [Accepted: 08/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the great progress of deep learning-based inter-residue contact/distance prediction, the discrete space formed by fragment assembly cannot satisfy the distance constraint well. Thus, the optimal solution of the continuous space may not be achieved. Designing an effective closed-loop continuous dihedral angle optimization strategy that complements the discrete fragment assembly is crucial to improve the performance of the distance-assisted fragment assembly method. RESULTS In this article, we proposed a de novo protein structure prediction method called IPTDFold based on closed-loop iterative partition sampling, topology adjustment and residue-level distance deviation optimization. First, local dihedral angle crossover and mutation operators are designed to explore the conformational space extensively and achieve information exchange between the conformations in the population. Then, the dihedral angle rotation model of loop region with partial inter-residue distance constraints is constructed, and the rotation angle satisfying the constraints is obtained by differential evolution algorithm, so as to adjust the spatial position relationship between the secondary structures. Finally, the residue distance deviation is evaluated according to the difference between the conformation and the predicted distance, and the dihedral angle of the residue is optimized with biased probability. The final model is generated by iterating the above three steps. IPTDFold is tested on 462 benchmark proteins, 24 FM targets of CASP13 and 20 FM targets of CASP14. Results show that IPTDFold is significantly superior to the distance-assisted fragment assembly method Rosetta_D (Rosetta with distance). In particular, the prediction accuracy of IPTDFold does not decrease as the length of the protein increases. When using the same FastRelax protocol, the prediction accuracy of IPTDFold is significantly superior to that of trRosetta without orientation constraints, and is equivalent to that of the full version of trRosetta. AVAILABILITYAND IMPLEMENTATION The source code and executable are freely available at https://github.com/iobio-zjut/IPTDFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guang-Xing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Liu-Jing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
43
|
Yang P, Zheng W, Ning K, Zhang Y. Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction. Proc Natl Acad Sci U S A 2021; 118:e2110828118. [PMID: 34873061 DOI: 10.1073/pnas.2110828118] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2021] [Indexed: 12/26/2022] Open
Abstract
Information derived from metagenome sequences through deep-learning techniques has significantly improved the accuracy of template free protein structure modeling. However, most of the deep learning-based modeling studies are based on blind sequence database searches and suffer from low efficiency in computational resource utilization and model construction, especially when the sequence library becomes prohibitively large. We proposed a MetaSource model built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil, and Fermentor) to decode the inherent linkage of microbial niches with protein homologous families. Large-scale protein family folding experiments on 8,700 unknown Pfam families showed that a microbiome targeted approach with multiple sequence alignment constructed from individual MetaSource biomes requires more than threefold less computer memory and CPU (central processing unit) time but generates contact-map and three-dimensional structure models with a significantly higher accuracy, compared with that using combined metagenome datasets. These results demonstrate an avenue to bridge the gap between the rapidly increasing metagenome databases and the limited computing resources for efficient genome-wide database mining, which provides a useful bluebook to guide future microbiome sequence database and modeling development for high-accuracy protein structure and function prediction.
Collapse
|
44
|
Su H, Wang W, Du Z, Peng Z, Gao S, Cheng M, Yang J. Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates. Adv Sci (Weinh) 2021; 8:e2102592. [PMID: 34719864 PMCID: PMC8693034 DOI: 10.1002/advs.202102592] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/12/2021] [Indexed: 06/04/2023]
Abstract
The accuracy of de novo protein structure prediction has been improved considerably in recent years, mostly due to the introduction of deep learning techniques. In this work, trRosettaX, an improved version of trRosetta for protein structure prediction is presented. The major improvement over trRosetta consists of two folds. The first is the application of a new multi-scale network, i.e., Res2Net, for improved prediction of inter-residue geometries, including distance and orientations. The second is an attention-based module to exploit multiple homologous templates to increase the accuracy further. Compared with trRosetta, trRosettaX improves the contact precision by 6% and 8% on the free modeling targets of CASP13 and CASP14, respectively. A preliminary version of trRosettaX is ranked as one of the top server groups in CASP14's blind test. Additional benchmark test on 161 targets from CAMEO (between Jun and Sep 2020) shows that trRosettaX achieves an average TM-score ≈0.8, outperforming the top groups in CAMEO. These data suggest the effectiveness of using the multi-scale network and the benefit of incorporating homologous templates into the network. The trRosettaX algorithm is incorporated into the trRosetta server since Nov 2020. The web server, the training and inference codes are available at: https://yanglab.nankai.edu.cn/trRosetta/.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Wenkai Wang
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zongyang Du
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| | - Shang‐Hua Gao
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Ming‐Ming Cheng
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| |
Collapse
|
45
|
Du Z, Su H, Wang W, Ye L, Wei H, Peng Z, Anishchenko I, Baker D, Yang J. The trRosetta server for fast and accurate protein structure prediction. Nat Protoc 2021; 16:5634-5651. [PMID: 34759384 DOI: 10.1038/s41596-021-00628-9] [Citation(s) in RCA: 204] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 08/31/2021] [Indexed: 11/10/2022]
Abstract
The trRosetta (transform-restrained Rosetta) server is a web-based platform for fast and accurate protein structure prediction, powered by deep learning and Rosetta. With the input of a protein's amino acid sequence, a deep neural network is first used to predict the inter-residue geometries, including distance and orientations. The predicted geometries are then transformed as restraints to guide the structure prediction on the basis of direct energy minimization, which is implemented under the framework of Rosetta. The trRosetta server distinguishes itself from other similar structure prediction servers in terms of rapid and accurate de novo structure prediction. As an illustration, trRosetta was applied to two Pfam families with unknown structures, for which the predicted de novo models were estimated to have high accuracy. Nevertheless, to take advantage of homology modeling, homologous templates are used as additional inputs to the network automatically. In general, it takes ~1 h to predict the final structure for a typical protein with ~300 amino acids, using a maximum of 10 CPU cores in parallel in our cluster system. To enable large-scale structure modeling, a downloadable package of trRosetta with open-source codes is available as well. A detailed guidance for using the package is also available in this protocol. The server and the package are available at https://yanglab.nankai.edu.cn/trRosetta/ and https://yanglab.nankai.edu.cn/trRosetta/download/ , respectively.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wenkai Wang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Lisha Ye
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA, USA.,Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.,Institute for Protein Design, University of Washington, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China.
| |
Collapse
|
46
|
Piersimoni L, Kastritis PL, Arlt C, Sinz A. Cross-Linking Mass Spectrometry for Investigating Protein Conformations and Protein-Protein Interactions─A Method for All Seasons. Chem Rev 2021; 122:7500-7531. [PMID: 34797068 DOI: 10.1021/acs.chemrev.1c00786] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Mass spectrometry (MS) has become one of the key technologies of structural biology. In this review, the contributions of chemical cross-linking combined with mass spectrometry (XL-MS) for studying three-dimensional structures of proteins and for investigating protein-protein interactions are outlined. We summarize the most important cross-linking reagents, software tools, and XL-MS workflows and highlight prominent examples for characterizing proteins, their assemblies, and interaction networks in vitro and in vivo. Computational modeling plays a crucial role in deriving 3D-structural information from XL-MS data. Integrating XL-MS with other techniques of structural biology, such as cryo-electron microscopy, has been successful in addressing biological questions that to date could not be answered. XL-MS is therefore expected to play an increasingly important role in structural biology in the future.
Collapse
Affiliation(s)
- Lolita Piersimoni
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany.,Center for Structural Mass Spectrometry, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany
| | - Panagiotis L Kastritis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Kurt-Mothes-Strasse 3a, D-06120 Halle (Saale), Germany.,Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany.,Biozentrum, Weinbergweg 22, D-06120 Halle (Saale), Germany
| | - Christian Arlt
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany.,Center for Structural Mass Spectrometry, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany
| | - Andrea Sinz
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany.,Center for Structural Mass Spectrometry, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany
| |
Collapse
|
47
|
Pozzati G, Zhu W, Bassot C, Lamb J, Kundrotas P, Elofsson A. Limits and potential of combined folding and docking. Bioinformatics 2021; 38:954-961. [PMID: 34788800 PMCID: PMC8796369 DOI: 10.1093/bioinformatics/btab760] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/23/2021] [Accepted: 11/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilising deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSAs). The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most earlier studies have not used the latest DL methods for inter-chain contact distance prediction. This article introduces a fold-and-dock method based on predicted residue-residue distances with trRosetta. RESULTS The method can simultaneously predict the tertiary and quaternary structure of a protein pair, even when the structures of the monomers are not known. The straightforward application of this method to a standard dataset for protein-protein docking yielded limited success. However, using alternative methods for generating MSAs allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods. Moreover, the results of conventional and fold-and-dock approaches are complementary, and thus a combined docking pipeline could increase overall docking success significantly. This methodology contributed to the best model for one of the CASP14 oligomeric targets, H1065. AVAILABILITY AND IMPLEMENTATION All scripts for predictions and analysis are available from https://github.com/ElofssonLab/bioinfo-toolbox/ and https://gitlab.com/ElofssonLab/benchmark5/. All models joined alignments, and evaluation results are available from the following figshare repository https://doi.org/10.6084/m9.figshare.14654886.v2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - John Lamb
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Petras Kundrotas
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden,Center for Computational Biology, The University of Kansas, Lawrence, KS 66047, USA
| | | |
Collapse
|
48
|
Abstract
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.
Collapse
Affiliation(s)
- M Bisardi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France.,Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - J Rodriguez-Rivas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - F Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France
| | - M Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| |
Collapse
|
49
|
Moffat L, Jones DT. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 2021; 37:3744-3751. [PMID: 34213528 PMCID: PMC8570780 DOI: 10.1093/bioinformatics/btab491] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/08/2021] [Accepted: 06/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lewis Moffat
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| |
Collapse
|
50
|
Torres MC, Martins Karl AL, Müller Pereira da Silva M, Dardenne LE, Bispo de Filippis AM. In Silico Analysis of Dengue Virus Serotype 2 Mutations Detected at the Intrahost Level in Patients with Different Clinical Outcomes. Microbiol Spectr 2021; 9:e0025621. [PMID: 34468189 DOI: 10.1128/Spectrum.00256-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Intrahost genetic diversity is thought to facilitate arbovirus adaptation to changing environments and hosts, and it may also be linked to viral pathogenesis. Intending to shed light on the viral determinants for severe dengue pathogenesis, we previously analyzed the DENV-2 intrahost genetic diversity in 68 patients clinically classified as dengue fever (n = 31), dengue with warning signs (n = 19), and severe dengue (n = 18), performing viral whole-genome deep sequencing from clinical samples with an amplicon-free approach. From it, we identified a set of 141 relevant mutations distributed throughout the viral genome that deserved further attention. Therefore, we employed molecular modeling to recreate three-dimensional models of the viral proteins and secondary RNA structures to map the mutations and assess their potential effects. Results showed that, in general lines, disruptive variants were identified primarily among dengue fever cases. In contrast, potential immune-escape variants were associated mainly with warning signs and severe cases, in line with the latter's longer intrahost evolution times. Furthermore, several mutations were located on protein-surface regions, with no associated function. They could represent sites of further investigation, as the interaction of viral and host proteins is critical for both host immunomodulation and virus hijacking of the cellular machinery. The present analysis provides new information about the implications of the intrahost genetic diversity of DENV-2, contributing to the knowledge about the viral factors possibly involved in its pathogenesis within the human host. Strengthening our results with functional studies could allow many of these variants to be considered in the design of therapeutic or prophylactic compounds and the improvement of diagnostic assays. IMPORTANCE Previous evidence showed that intrahost genetic diversity in arboviruses may be linked to viral pathogenesis and that one or a few amino acid replacements within a single protein are enough to modify a biological feature of an RNA virus. To assess dengue virus serotype 2 determinants potentially involved in pathogenesis, we previously analyzed the intrahost genetic diversity of the virus in patients with different clinical outcomes and identified a set of 141 mutations that deserved further study. Thus, through a molecular modeling approach, we showed that disruptive variants were identified primarily among cases with mild dengue fever, while potential immune-escape variants were mainly associated with cases of greater severity. We believe that some of the variants pointed out in this study were attractive enough to be potentially considered in future intelligent designs of therapeutic or prophylactic compounds or the improvement of diagnostic tools. The present analysis provides new information about DENV-2 viral factors possibly involved in its pathogenesis within the human host.
Collapse
|