1
|
Hameduh T, Miller AD, Heger Z, Haddad Y. The proteomic code: Novel amino acid residue pairing models "encode" protein folding and protein-protein interactions. Comput Biol Med 2025; 190:110033. [PMID: 40112562 DOI: 10.1016/j.compbiomed.2025.110033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 03/11/2025] [Accepted: 03/13/2025] [Indexed: 03/22/2025]
Abstract
Recent advances in protein 3D structure prediction using deep learning have focused on the importance of amino acid residue-residue connections (i.e., pairwise atomic contacts) for accuracy at the expense of mechanistic interpretability. Therefore, we decided to perform a series of analyses based on an alternative framework of residue-residue connections making primary use of the TOP2018 dataset. This framework of residue-residue connections is derived from amino acid residue pairing models both historic and new, all based on genetic principles complemented by relevant biophysical principles. Of these pairing models, three new models (named the GU, Transmuted and Shift pairing models) exhibit the highest observed-over-expected ratios and highest correlations in statistical analyses with various intra- and inter-chain datasets, in comparison to the remaining models. In addition, these new pairing models are universally frequent across different connection ranges, secondary structure connections, and protein sizes. Accordingly, following further statistical and other analyses described herein, we have come to a major conclusion that all three pairing models together could represent the basis of a universal proteomic code (second genetic code) sufficient, in and of itself, to "encode" for both protein folding mechanisms and protein-protein interactions.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Andrew D Miller
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic; Veterinary Research Institute, Hudcova 296/70, CZ-621 00, Brno, Czech Republic; KP Therapeutics (Europe) s.r.o., Purkyňova 649/127, CZ-612 00, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic.
| |
Collapse
|
2
|
Roy A, Paul I, Chakraborty P, Saha A, Ray S. Unlocking the influence of PNPLA3 mutations on lipolysis: Insights into lipid droplet formation and metabolic dynamics in metabolic dysfunction-associated steatotic liver disease. Biochim Biophys Acta Gen Subj 2025; 1869:130766. [PMID: 39832620 DOI: 10.1016/j.bbagen.2025.130766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 12/23/2024] [Accepted: 01/14/2025] [Indexed: 01/22/2025]
Abstract
BACKGROUND Metabolic dysfunction-associated steatotic liver disease (MASLD) covers a range of liver conditions marked by the buildup of fat, spanning from simple fatty liver to more advanced stages like metabolic dysfunction-associated steatohepatitis and cirrhosis. METHODS Our in-depth analysis of PNPLA3_WT and mutants (I148M (MT1) and C15S (MT2)) provides insights into their structure-function dynamics in lipid metabolism, especially lipid droplet hydrolysis and ABHD5 binding. Employing molecular docking, binding affinity, MD analysis, dissociation constant, and MM/GBSA analysis, we delineated distinct binding characteristics between wild-type and mutants. RESULTS Structural dynamics analysis revealed that unbound mutants exhibited higher flexibility, increased Rg and SASA values, and broader energy landscapes, indicating multiple inactive states. Mutations, especially in PNPLA3_MT1, reduced the exposure of the catalytic serine, potentially impairing enzymatic activity and LD hydrolysis efficiency. Altered interaction patterns and dynamics, particularly a shift in ABHD5 binding regions towards the C-terminal domain, underscore its role in LD metabolism. Energy dynamics analysis of the protein complexes revealed PNPLA3_WT exhibited multiple low-energy macrostates, whereas the mutants displayed narrower energy landscapes, suggesting a more stable functional state. PNPLA3_MT1 demonstrated the highest affinity towards ABHD5, highlighting the complex interplay between protein structure, dynamics, and lipid metabolism regulation. CONCLUSION PNPLA3_MT1 mutant exhibits the highest flexibility and significantly reduced catalytic serine accessibility, leading to impaired lipolysis. Contrarily, PNPLA3_WT maintains stable catalytic efficiency and effective LD hydrolysis, with PNPLA3_MT2 displaying intermediate behavior. GENERAL SIGNIFICANCE Our research provides valuable insights into the metabolic implications of PNPLA3 mutations, offering a path for potential therapeutic interventions in MASLD.
Collapse
Affiliation(s)
- Alankar Roy
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | - Ishani Paul
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | | | - Adrija Saha
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | - Sujay Ray
- Amity Institute of Biotechnology, Amity University, Kolkata, India.
| |
Collapse
|
3
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2025; 67:862-884. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
4
|
Cueno ME, Kamio N, Imai K. Avian influenza A H5N1 hemagglutinin protein models have distinct structural patterns re-occurring across the 1959-2023 strains. Biosystems 2024; 246:105347. [PMID: 39349133 DOI: 10.1016/j.biosystems.2024.105347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 09/26/2024] [Accepted: 09/27/2024] [Indexed: 10/02/2024]
Abstract
Influenza A H5N1 hemagglutinin (HA) plays a crucial role in viral pathogenesis and changes in the HA receptor binding domain (RBD) have been attributed to alterations in viral pathogenesis. Mutations often occur within the HA which in-turn results in HA structural changes that consequently contribute to protein evolution. However, the possible occurrence of mutations that results to reversion of the HA protein (going back to an ancestral protein conformation) which in-turn creates distinct HA structural patterns across the 1959-2023 H5N1 viral evolution has never been investigated. Here, we generated and verified the quality of the HA models, identified similar HA structural patterns, and elucidated the possible variations in HA RBD structural dynamics. Our results show that there are 7 distinct structural patterns occurring among the 1959-2023 H5N1 HA models which suggests that reversion of the HA protein putatively occurs during viral evolution. Similarly, we found that the HA RBD structural dynamics vary among the 7 distinct structural patterns possibly affecting viral pathogenesis.
Collapse
Affiliation(s)
- Marni E Cueno
- Department of Microbiology and Immunology, Nihon University School of Dentistry, Tokyo, 101-8310, Japan.
| | - Noriaki Kamio
- Department of Microbiology and Immunology, Nihon University School of Dentistry, Tokyo, 101-8310, Japan
| | - Kenichi Imai
- Department of Microbiology and Immunology, Nihon University School of Dentistry, Tokyo, 101-8310, Japan
| |
Collapse
|
5
|
Srivastava S, Kolbe M. Novel "GaEl Antigenic Patches" Identified by a "Reverse Epitomics" Approach to Design Multipatch Vaccines against NIPAH Infection, a Silent Threat to Global Human Health. ACS OMEGA 2023; 8:31698-31713. [PMID: 37692250 PMCID: PMC10483669 DOI: 10.1021/acsomega.3c01909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/01/2023] [Indexed: 09/12/2023]
Abstract
Nipah virus (NiV) is a zoonotic virus that causes lethal encephalitis and respiratory disease with the symptom of endothelial cell-cell fusion. Several NiV outbreaks have been reported since 1999 with nearly annual occurrences in Bangladesh. The outbreaks had high mortality rates ranging from 40 to 90%. No specific vaccine has yet been reported against NiV. Recently, several vaccine candidates and different designs of vaccines composed of epitopes against NiV were proposed. Most of the vaccines target single protein or protein complex subunits of the pathogen. The multiepitope vaccines proposed also cover a largely limited number of epitopes, and hence, their efficiency is still uncertain. To address the urgent need for a specific and effective vaccine against NiV infection, in the present study, we have utilized the "reverse epitomics" approach ("overlapping-epitope-clusters-to-patches" method) to identify "antigenic patches" (Ag-Patches) and utilize them as immunogenic composition for multipatch vaccine (MPV) design. The designed MPVs were analyzed for immunologically crucial parameters, physiochemical properties, and interaction with Toll-like receptor 3 ectodomain. In total, 30 CTL (cytotoxic T lymphocyte) and 27 HTL (helper T lymphocyte) antigenic patches were identified from the entire NiV proteome based on the clusters of overlapping epitopes. These identified Ag-Patches cover a total of discrete 362 CTL and 414 HTL epitopes from the entire proteome of NiV. The antigenic patches were utilized as immunogenic composition for the design of two CTL and two HTL multipatch vaccines. The 57 antigenic patches utilized here cover 776 overlapping epitopes targeting 52 different HLA class I and II alleles, providing a global ethnically distributed human population coverage of 99.71%. Such large number of epitope coverage resulting in large human population coverage cannot be reached with single-protein/subunit or multiepitope based vaccines. The reported antigenic patches also provide potential immunogenic composition for early detection diagnostic kits for NiV infection. Further, all the MPVs and Toll-like receptor ectodomain complexes show a stable nature of molecular interaction with numerous hydrogen bonds, salt bridges, and nonbounded contact formation and acceptable root mean square deviation and fluctuation. The cDNA analysis shows a favorable large-scale expression of the MPV constructs in a human cell line. By utilizing the novel "reverse epitomics" approach, highly immunogenic novel "GaEl antigenic patches" (GaEl Ag-Patches), a synonym term for "antigenic patches", were identified and utilized as immunogenic composition to design four MPVs against NiV. We conclude that the novel multipatch vaccines are potential candidates to combat NiV, with greater effectiveness, high specificity, and large human population coverage worldwide.
Collapse
Affiliation(s)
- Sukrit Srivastava
- Infection
Biology Group, Indian Foundation for Fundamental
Research Trust, Raebareli, Uttar Pradesh 229316, India
- Department
for Structural Infection Biology, Centre
for Structural Systems Biology (CSSB) & Helmholtz-Centre for Infection
Research, Notkestraße 85, 22607 Hamburg, Germany
| | - Michael Kolbe
- Department
for Structural Infection Biology, Centre
for Structural Systems Biology (CSSB) & Helmholtz-Centre for Infection
Research, Notkestraße 85, 22607 Hamburg, Germany
- Faculty
of Mathematics, Informatics and Natural Sciences, University of Hamburg, Rothenbaumchaussee 19, 20148 Hamburg, Germany
| |
Collapse
|
6
|
Srivastava S, Verma S, Kamthania M, Saxena AK, Pandey KC, Pande V, Kolbe M. Exploring the structural basis to develop efficient multi-epitope vaccines displaying interaction with HLA and TAP and TLR3 molecules to prevent NIPAH infection, a global threat to human health. PLoS One 2023; 18:e0282580. [PMID: 36920996 PMCID: PMC10016716 DOI: 10.1371/journal.pone.0282580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 02/21/2023] [Indexed: 03/16/2023] Open
Abstract
Nipah virus (NiV) is an emerging zoonotic virus that caused several serious outbreaks in the south asian region with high mortality rates ranging from 40 to 90% since 2001. NiV infection causes lethal encephalitis and respiratory disease with the symptom of endothelial cell-cell fusion. No specific and effective vaccine has yet been reported against NiV. To address the urgent need for a specific and effective vaccine against NiV infection, in the present study, we have designed two Multi-Epitope Vaccines (MEVs) composed of 33 Cytotoxic T lymphocyte (CTL) epitopes and 38 Helper T lymphocyte (HTL) epitopes. Out of those CTL and HTL combined 71 epitopes, 61 novel epitopes targeting nine different NiV proteins were not used before for vaccine design. Codon optimization for the cDNA of both the designed MEVs might ensure high expression potential in the human cell line as stable proteins. Both MEVs carry potential B cell linear epitope overlapping regions, B cell discontinuous epitopes as well as IFN-γ inducing epitopes. Additional criteria such as sequence consensus amongst CTL, HTL and B Cell epitopes was implemented for the design of final constructs constituting MEVs. Hence, the designed MEVs carry the potential to elicit cell-mediated as well as humoral immune response. Selected overlapping CTL and HTL epitopes were validated for their stable molecular interactions with HLA class I and II alleles and in case of CTL epitopes with human Transporter Associated with antigen Processing (TAP) cavity. The structure based epitope cross validation for interaction with TAP cavity was used as another criteria choosing final epitopes for NiV MEVs. Finally, human Beta-defensin 2 and Beta-defensin 3 were used as adjuvants to enhance the immune response of both the MEVs. Molecular dynamics simulation studies of MEVs-TLR3 ectodomain (Human Toll-Like Receptor 3) complex indicated the stable molecular interaction. We conclude that the MEVs designed and in silico validated here could be highly potential vaccine candidates to combat NiV infections, with great effectiveness, high specificity and large human population coverage worldwide.
Collapse
Affiliation(s)
- Sukrit Srivastava
- Infection Biology Group, Indian Foundation for Fundamental Research Trust, RaeBareli, India
- Department for Structural Infection Biology, Centre for Structural Systems Biology (CSSB) & Helmholtz-Centre for Infection Research, Hamburg, Germany
| | - Sonia Verma
- Protein Biochemistry & Engineering Lab, Parasite-Host Biology Group, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Mohit Kamthania
- Infection Biology Group, Indian Foundation for Fundamental Research Trust, RaeBareli, India
| | - Ajay Kumar Saxena
- Molecular Medicine Lab., School of Life Science, Jawaharlal Nehru University, New Delhi, India
| | - Kailash C. Pandey
- Protein Biochemistry & Engineering Lab, Parasite-Host Biology Group, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Veena Pande
- Kumaun University, Bheemtal, Nainital, Uttarakhand, India
| | - Michael Kolbe
- Department for Structural Infection Biology, Centre for Structural Systems Biology (CSSB) & Helmholtz-Centre for Infection Research, Hamburg, Germany
- Faculty of Mathematics, Informatics and Natural Sciences, University of Hamburg, Hamburg, Germany
| |
Collapse
|
7
|
Davies JS, Currie MJ, North RA, Scalise M, Wright JD, Copping JM, Remus DM, Gulati A, Morado DR, Jamieson SA, Newton-Vesty MC, Abeysekera GS, Ramaswamy S, Friemann R, Wakatsuki S, Allison JR, Indiveri C, Drew D, Mace PD, Dobson RCJ. Structure and mechanism of a tripartite ATP-independent periplasmic TRAP transporter. Nat Commun 2023; 14:1120. [PMID: 36849793 PMCID: PMC9971032 DOI: 10.1038/s41467-023-36590-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 02/07/2023] [Indexed: 03/01/2023] Open
Abstract
In bacteria and archaea, tripartite ATP-independent periplasmic (TRAP) transporters uptake essential nutrients. TRAP transporters receive their substrates via a secreted soluble substrate-binding protein. How a sodium ion-driven secondary active transporter is strictly coupled to a substrate-binding protein is poorly understood. Here we report the cryo-EM structure of the sialic acid TRAP transporter SiaQM from Photobacterium profundum at 2.97 Å resolution. SiaM comprises a "transport" domain and a "scaffold" domain, with the transport domain consisting of helical hairpins as seen in the sodium ion-coupled elevator transporter VcINDY. The SiaQ protein forms intimate contacts with SiaM to extend the size of the scaffold domain, suggesting that TRAP transporters may operate as monomers, rather than the typically observed oligomers for elevator-type transporters. We identify the Na+ and sialic acid binding sites in SiaM and demonstrate a strict dependence on the substrate-binding protein SiaP for uptake. We report the SiaP crystal structure that, together with docking studies, suggest the molecular basis for how sialic acid is delivered to the SiaQM transporter complex. We thus propose a model for substrate transport by TRAP proteins, which we describe herein as an 'elevator-with-an-operator' mechanism.
Collapse
Affiliation(s)
- James S Davies
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand.,Department of Biochemistry and Biophysics, Stockholm University, 10691, Stockholm, Sweden
| | - Michael J Currie
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand
| | - Rachel A North
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand. .,Department of Biochemistry and Biophysics, Stockholm University, 10691, Stockholm, Sweden.
| | - Mariafrancesca Scalise
- Department DiBEST (Biologia, Ecologia, Scienze della Terra) Unit of Biochemistry and Molecular Biotechnology, University of Calabria, Via P. Bucci 4C, 87036, Arcavacata di Rende, Italy
| | - Joshua D Wright
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand
| | - Jack M Copping
- Biomolecular Interaction Centre, Digital Life Institute, Maurice Wilkins Centre for Molecular Biodiscovery, and School of Biological Sciences, University of Auckland, Auckland, 1010, New Zealand
| | - Daniela M Remus
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand
| | - Ashutosh Gulati
- Department of Biochemistry and Biophysics, Stockholm University, 10691, Stockholm, Sweden
| | - Dustin R Morado
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 17165, Solna, Sweden
| | - Sam A Jamieson
- Biochemistry Department, School of Biomedical Sciences, University of Otago, Dunedin, 9054, New Zealand
| | - Michael C Newton-Vesty
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand
| | - Gayan S Abeysekera
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand
| | - Subramanian Ramaswamy
- Biological Sciences and Biomedical Engineering, Bindley Bioscience Center, Purdue University, 1203 W State St, West Lafayette, IN 47906, USA
| | - Rosmarie Friemann
- Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, S-40530, Gothenburg, Sweden
| | - Soichi Wakatsuki
- Biological Sciences Division, SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA.,Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Jane R Allison
- Biomolecular Interaction Centre, Digital Life Institute, Maurice Wilkins Centre for Molecular Biodiscovery, and School of Biological Sciences, University of Auckland, Auckland, 1010, New Zealand
| | - Cesare Indiveri
- Department DiBEST (Biologia, Ecologia, Scienze della Terra) Unit of Biochemistry and Molecular Biotechnology, University of Calabria, Via P. Bucci 4C, 87036, Arcavacata di Rende, Italy.,CNR Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Via Amendola 122/O, 70126, Bari, Italy
| | - David Drew
- Department of Biochemistry and Biophysics, Stockholm University, 10691, Stockholm, Sweden
| | - Peter D Mace
- Biochemistry Department, School of Biomedical Sciences, University of Otago, Dunedin, 9054, New Zealand
| | - Renwick C J Dobson
- Biomolecular Interaction Centre, Maurice Wilkins Centre for Biodiscovery, MacDiarmid Institute for Advanced Materials and Nanotechnology and School of Biological Sciences, University of Canterbury, PO Box 4800, Christchurch, 8140, New Zealand. .,Bio21 Molecular Science and Biotechnology Institute, Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Victoria, 3010, Australia.
| |
Collapse
|
8
|
Structural patterns of SARS-CoV-2 variants of concern (alpha, beta, gamma, delta) spike protein are influenced by variant-specific amino acid mutations: A computational study with implications on viral evolution. J Theor Biol 2023; 558:111376. [PMID: 36473508 PMCID: PMC9721161 DOI: 10.1016/j.jtbi.2022.111376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 11/28/2022] [Accepted: 11/29/2022] [Indexed: 12/12/2022]
Abstract
SARS-CoV-2 (SARS2) regularly mutates resulting to variants of concern (VOC) which have higher virulence and transmissibility rates while concurrently evading available therapeutic strategies. This highlights the importance of amino acid mutations occurring in the SARS2 spike protein structure since it may affect virus biology. However, this was never fully elucidated. Here, network analysis was performed based on the COVID-19 genomic epidemiology network between December 2019-July 2021. Representative SARS2 VOC spike protein models were generated and quality checked, protein model superimposition was done, and common contact based on contact mapping was established. Throughout this study, we found that: (1) certain individual variant-specific amino acid mutations can affect the spike protein structural pattern; (2) certain individual variant-specific amino acid mutations had no affect on the spike protein structural pattern; and (3) certain combination of variant-specific amino acids are putatively epistatic mutations that can potentially influence the VOC spike protein structural pattern. This manuscript was submitted as part of a theme issue on "Modelling COVID-19 and Preparedness for Future Pandemics".
Collapse
|
9
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
10
|
Roche R, Bhattacharya S, Shuvo MH, Bhattacharya D. rrQNet: Protein contact map quality estimation by deep evolutionary reconciliation. Proteins 2022; 90:2023-2034. [PMID: 35751651 PMCID: PMC9633355 DOI: 10.1002/prot.26394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/31/2022] [Accepted: 06/21/2022] [Indexed: 11/10/2022]
Abstract
Protein contact maps have proven to be a valuable tool in the deep learning revolution of protein structure prediction, ushering in the recent breakthrough by AlphaFold2. However, self-assessment of the quality of predicted structures are typically performed at the granularity of three-dimensional coordinates as opposed to directly exploiting the rotation- and translation-invariant two-dimensional (2D) contact maps. Here, we present rrQNet, a deep learning method for self-assessment in 2D by contact map quality estimation. Our approach is based on the intuition that for a contact map to be of high quality, the residue pairs predicted to be in contact should be mutually consistent with the evolutionary context of the protein. The deep neural network architecture of rrQNet implements this intuition by cascading two deep modules-one encoding the evolutionary context and the other performing evolutionary reconciliation. The penultimate stage of rrQNet estimates the quality scores at the interacting residue-pair level, which are then aggregated for estimating the quality of a contact map. This design choice offers versatility at varied resolutions from individual residue pairs to full-fledged contact maps. Trained on multiple complementary sources of contact predictors, rrQNet facilitates generalizability across various contact maps. By rigorously testing using publicly available datasets and comparing against several in-house baseline approaches, we show that rrQNet accurately reproduces the true quality score of a predicted contact map and successfully distinguishes between accurate and inaccurate contact maps predicted by a wide variety of contact predictors. The open-source rrQNet software package is freely available at https://github.com/Bhattacharya-Lab/rrQNet.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
11
|
Zhang H, Huang Y, Bei Z, Ju Z, Meng J, Hao M, Zhang J, Zhang H, Xi W. Inter-Residue Distance Prediction From Duet Deep Learning Models. Front Genet 2022; 13:887491. [PMID: 35651930 PMCID: PMC9148999 DOI: 10.3389/fgene.2022.887491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 03/30/2022] [Indexed: 12/04/2022] Open
Abstract
Residue distance prediction from the sequence is critical for many biological applications such as protein structure reconstruction, protein–protein interaction prediction, and protein design. However, prediction of fine-grained distances between residues with long sequence separations still remains challenging. In this study, we propose DuetDis, a method based on duet feature sets and deep residual network with squeeze-and-excitation (SE), for protein inter-residue distance prediction. DuetDis embraces the ability to learn and fuse features directly or indirectly extracted from the whole-genome/metagenomic databases and, therefore, minimize the information loss through ensembling models trained on different feature sets. We evaluate DuetDis and 11 widely used peer methods on a large-scale test set (610 proteins chains). The experimental results suggest that 1) prediction results from different feature sets show obvious differences; 2) ensembling different feature sets can improve the prediction performance; 3) high-quality multiple sequence alignment (MSA) used for both training and testing can greatly improve the prediction performance; and 4) DuetDis is more accurate than peer methods for the overall prediction, more reliable in terms of model prediction score, and more robust against shallow multiple sequence alignment (MSA).
Collapse
Affiliation(s)
- Huiling Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ying Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhendong Bei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen Ju
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Jingjing Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haiping Zhang
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenhui Xi
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Wenhui Xi,
| |
Collapse
|
12
|
Santra S, Jana M. Predicting the evolution of number of native contacts of a small protein by using deep learning approach. Comput Biol Chem 2022; 97:107625. [DOI: 10.1016/j.compbiolchem.2022.107625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 01/07/2022] [Accepted: 01/09/2022] [Indexed: 11/28/2022]
|
13
|
Gaur A, Jindal Y, Singh V, Tiwari R, Kumar D, Kaushik D, Singh J, Narwal S, Jaiswal S, Iquebal MA, Angadi UB, Singh G, Rai A, Singh GP, Sheoran S. GWAS to Identify Novel QTNs for WSCs Accumulation in Wheat Peduncle Under Different Water Regimes. FRONTIERS IN PLANT SCIENCE 2022; 13:825687. [PMID: 35310635 PMCID: PMC8928439 DOI: 10.3389/fpls.2022.825687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/27/2022] [Indexed: 05/27/2023]
Abstract
Water-soluble carbohydrates (WSCs) play a vital role in water stress avoidance and buffering wheat grain yield. However, the genetic architecture of stem WSCs' accumulation is partially understood, and few candidate genes are known. This study utilizes the compressed mixed linear model-based genome wide association study (GWAS) and heuristic post GWAS analyses to identify causative quantitative trait nucleotides (QTNs) and candidate genes for stem WSCs' content at 15 days after anthesis under different water regimes (irrigated, rainfed, and drought). Glucose, fructose, sucrose, fructans, total non-structural carbohydrates (the sum of individual sugars), total WSCs (anthrone based) quantified in the peduncle of 301 bread wheat genotypes under multiple environments (E01-E08) pertaining different water regimes, and 14,571 SNPs from "35K Axiom Wheat Breeders" Array were used for analysis. As a result, 570 significant nucleotide trait associations were identified on all chromosomes except for 4D, of which 163 were considered stable. A total of 112 quantitative trait nucleotide regions (QNRs) were identified of which 47 were presumable novel. QNRs qWSC-3B.2 and qWSC-7A.2 were identified as the hotspots. Post GWAS integration of multiple data resources prioritized 208 putative candidate genes delimited into 64 QNRs, which can be critical in understanding the genetic architecture of stem WSCs accumulation in wheat under optimum and water-stressed environments. At least 19 stable QTNs were found associated with 24 prioritized candidate genes. Clusters of fructans metabolic genes reported in the QNRs qWSC-4A.2 and qWSC-7A.2. These genes can be utilized to bring an optimum combination of various fructans metabolic genes to improve the accumulation and remobilization of stem WSCs and water stress tolerance. These results will further strengthen wheat breeding programs targeting sustainable wheat production under limited water conditions.
Collapse
Affiliation(s)
- Arpit Gaur
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, India
| | - Yogesh Jindal
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
| | - Vikram Singh
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
| | - Ratan Tiwari
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, India
| | - Dinesh Kumar
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Deepak Kaushik
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
| | - Jogendra Singh
- ICAR-Central Soil Salinity Research Institute, Karnal, India
| | - Sneh Narwal
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Sarika Jaiswal
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Mir Asif Iquebal
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ulavapp B. Angadi
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Gyanendra Singh
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, India
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | - Sonia Sheoran
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, India
| |
Collapse
|
14
|
Srivastava S, Verma S, Kamthania M, Agarwal D, Saxena AK, Kolbe M, Singh S, Kotnis A, Rathi B, Nayar SA, Shin HJ, Vashisht K, Pandey KC. Computationally validated SARS-CoV-2 CTL and HTL Multi-Patch vaccines, designed by reverse epitomics approach, show potential to cover large ethnically distributed human population worldwide. J Biomol Struct Dyn 2022; 40:2369-2388. [PMID: 33155524 PMCID: PMC7651196 DOI: 10.1080/07391102.2020.1838329] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 10/13/2020] [Indexed: 02/07/2023]
Abstract
The SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) is responsible for the COVID-19 outbreak. The highly contagious COVID-19 disease has spread to 216 countries in less than six months. Though several vaccine candidates are being claimed, an effective vaccine is yet to come. A novel reverse epitomics approach, 'overlapping-epitope-clusters-to-patches' method is utilized to identify the antigenic regions from the SARS-CoV-2 proteome. These antigenic regions are named as 'Ag-Patch or Ag-Patches', for Antigenic Patch or Patches. The identification of Ag-Patches is based on the clusters of overlapping epitopes rising from SARS-CoV-2 proteins. Further, we have utilized the identified Ag-Patches to design Multi-Patch Vaccines (MPVs), proposing a novel method for the vaccine design. The designed MPVs were analyzed for immunologically crucial parameters, physiochemical properties and cDNA constructs. We identified 73 CTL (Cytotoxic T-Lymphocyte) and 49 HTL (Helper T-Lymphocyte) novel Ag-Patches from the proteome of SARS-CoV-2. The identified Ag-Patches utilized to design MPVs cover 768 overlapping epitopes targeting 55 different HLA alleles leading to 99.98% of world human population coverage. The MPVs and Toll-Like Receptor ectodomain complex shows stable complex formation tendency. Further, the cDNA analysis favors high expression of the MPVs constructs in a human cell line. We identified highly immunogenic novel Ag-Patches from the entire proteome of SARS CoV-2 by a novel reverse epitomics approach and utilized them to design MPVs. We conclude that the novel MPVs could be a highly potential novel approach to combat SARS-CoV-2, with greater effectiveness, high specificity and large human population coverage worldwide. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sukrit Srivastava
- Molecular Medicine Lab., School of Life Science, Jawaharlal Nehru University, New Delhi, India
- Infection Biology Group, Indian Foundation for Fundamental Research, RaeBareli, India
| | - Sonia Verma
- Parasite-Host Biology Group, Protein Biochemistry & Engineering Lab, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Mohit Kamthania
- Infection Biology Group, Indian Foundation for Fundamental Research, RaeBareli, India
| | - Deepa Agarwal
- Infection Biology Group, Indian Foundation for Fundamental Research, RaeBareli, India
| | - Ajay Kumar Saxena
- Molecular Medicine Lab., School of Life Science, Jawaharlal Nehru University, New Delhi, India
| | - Michael Kolbe
- Department for Structural Infection Biology, Centre for Structural Systems Biology (CSSB) & Helmholtz-Centre for Infection Research, Hamburg, Germany
- Faculty of Mathematics, Informatics and Natural Sciences, University of Hamburg, Hamburg, Germany
| | - Sarman Singh
- Department of Microbiology, All India Institute of Medical Sciences (AIIMS), Bhopal, India
| | - Ashwin Kotnis
- Department of Biochemistry, All India Institute of Medical Sciences (AIIMS), Bhopal, India
| | - Brijesh Rathi
- Laboratory For Translational Chemistry and Drug Discovery, Hansraj College, University of Delhi, New Delhi, India
| | - Seema A. Nayar
- Department of Microbiology, Government Medical College, Trivandrum, India
- Department of Microbiology, Sree Gokulam Medical College, Trivandrum, India
| | - Ho-Joon Shin
- Department of Microbiology, School of Medicine, Ajou University, Suwon, South Korea
| | - Kapil Vashisht
- Parasite-Host Biology Group, Protein Biochemistry & Engineering Lab, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Kailash C. Pandey
- Parasite-Host Biology Group, Protein Biochemistry & Engineering Lab, ICMR-National Institute of Malaria Research, New Delhi, India
| |
Collapse
|
15
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
16
|
Li Y, Zhang C, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins 2021; 89:1911-1921. [PMID: 34382712 PMCID: PMC8616805 DOI: 10.1002/prot.26211] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 01/12/2023]
Abstract
This article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning-based contact/distance predictor was employed based on the ensemble of two complementary coevolution features coupling with deep residual networks. We also improved our multiple sequence alignment (MSA) generation protocol with wholesale meta-genome sequence databases. On 22 CASP14 free modeling (FM) targets, the proposed model achieved a top-L/5 long-range precision of 63.8% and a mean distance bin error of 1.494. Based on the predicted distance potentials, 11 out of 22 FM targets and all of the 14 FM/template-based modeling (TBM) targets have correctly predicted folds (TM-score >0.5), suggesting that our approach can provide reliable distance potentials for ab initio protein folding.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
17
|
Cueno ME, Imai K. Structural Insights on the SARS-CoV-2 Variants of Concern Spike Glycoprotein: A Computational Study With Possible Clinical Implications. Front Genet 2021; 12:773726. [PMID: 34745235 PMCID: PMC8568765 DOI: 10.3389/fgene.2021.773726] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 10/07/2021] [Indexed: 12/31/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19) pandemic has been attributed to SARS-CoV-2 (SARS2) and, consequently, SARS2 has evolved into multiple SARS2 variants driving subsequent waves of infections. In particular, variants of concern (VOC) were identified to have both increased transmissibility and virulence ascribable to mutational changes occurring within the spike protein resulting to modifications in the protein structural orientation which in-turn may affect viral pathogenesis. However, this was never fully elucidated. Here, we generated spike models of endemic HCoVs (HCoV 229E, HCoV OC43, HCoV NL63, HCoV HKU1, SARS CoV, MERS CoV), original SARS2, and VOC (alpha, beta, gamma, delta). Model quality check, structural superimposition, and structural comparison based on RMSD values, TM scores, and contact mapping were all performed. We found that: 1) structural comparison between the original SARS2 and VOC whole spike protein model have minor structural differences (TM > 0.98); 2) the whole VOC spike models putatively have higher structural similarity (TM > 0.70) to spike models from endemic HCoVs coming from the same phylogenetic cluster; 3) original SARS2 S1-CTD and S1-NTD models are structurally comparable to VOC S1-CTD (TM = 1.0) and S1-NTD (TM > 0.96); and 4) endemic HCoV S1-CTD and S1-NTD models are structurally comparable to VOC S1-CTD (TM > 0.70) and S1-NTD (TM > 0.70) models belonging to the same phylogenetic cluster. Overall, we propose that structural similarities (possibly ascribable to similar conformational epitopes) may help determine immune cross-reactivity, whereas, structural differences (possibly associated with varying conformational epitopes) may lead to viral infection (either reinfection or breakthrough infection).
Collapse
Affiliation(s)
- Marni E Cueno
- Department of Microbiology, Nihon University School of Dentistry, Tokyo, Japan
| | - Kenichi Imai
- Department of Microbiology, Nihon University School of Dentistry, Tokyo, Japan
| |
Collapse
|
18
|
Zhang H, Bei Z, Xi W, Hao M, Ju Z, Saravanan KM, Zhang H, Guo N, Wei Y. Evaluation of residue-residue contact prediction methods: From retrospective to prospective. PLoS Comput Biol 2021; 17:e1009027. [PMID: 34029314 PMCID: PMC8177648 DOI: 10.1371/journal.pcbi.1009027] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 06/04/2021] [Accepted: 04/28/2021] [Indexed: 12/31/2022] Open
Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized. The amino acid sequence of a protein ultimately determines its tertiary structure, and the tertiary structure determines its function(s) and plays a key role in understanding biological processes and disease pathogenesis. Protein tertiary structure can be determined using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, which are very expensive and time-consuming. As an alternative, researchers are trying to use in silico methods to predict the 3D structures. Residue contact-assisted protein folding paves an avenue for sequence-based protein structure prediction and therefore has become one of the most challenging and promising problems in structural bioinformatics. Over the past years, contact prediction has undergone continuous evolution in techniques. Through a retrospective analysis of traditional machine learning /evolutionary coupling analysis methods/ consensus machine learning methods and a multi-perspective study on recently developed deep learning methods, we explore the most advanced contact predictors, pursue application scenarios for different methods, and seek prospective directions for further improvement. We anticipate that our study will serve as a practical and useful guide for the development of future approaches to contact prediction.
Collapse
Affiliation(s)
- Huiling Zhang
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhendong Bei
- Cloud Computing Department, Alibaba Group, Hangzhou, China
| | - Wenhui Xi
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Zhen Ju
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Konda Mani Saravanan
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haiping Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ning Guo
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- * E-mail:
| |
Collapse
|
19
|
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021; 17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open
Abstract
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.
Collapse
|
20
|
Junqueira Alves C, Silva Ladeira J, Hannah T, Pedroso Dias RJ, Zabala Capriles PV, Yotoko K, Zou H, Friedel RH. Evolution and Diversity of Semaphorins and Plexins in Choanoflagellates. Genome Biol Evol 2021; 13:6149127. [PMID: 33624753 PMCID: PMC8011033 DOI: 10.1093/gbe/evab035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2021] [Indexed: 12/22/2022] Open
Abstract
Semaphorins and plexins are cell surface ligand/receptor proteins that affect cytoskeletal dynamics in metazoan cells. Interestingly, they are also present in Choanoflagellata, a class of unicellular heterotrophic flagellates that forms the phylogenetic sister group to Metazoa. Several members of choanoflagellates are capable of forming transient colonies, whereas others reside solitary inside exoskeletons; their molecular diversity is only beginning to emerge. Here, we surveyed genomics data from 22 choanoflagellate species and detected semaphorin/plexin pairs in 16 species. Choanoflagellate semaphorins (Sema-FN1) contain several domain features distinct from metazoan semaphorins, including an N-terminal Reeler domain that may facilitate dimer stabilization, an array of fibronectin type III domains, a variable serine/threonine-rich domain that is a potential site for O-linked glycosylation, and a SEA domain that can undergo autoproteolysis. In contrast, choanoflagellate plexins (Plexin-1) harbor a domain arrangement that is largely identical to metazoan plexins. Both Sema-FN1 and Plexin-1 also contain a short homologous motif near the C-terminus, likely associated with a shared function. Three-dimensional molecular models revealed a highly conserved structural architecture of choanoflagellate Plexin-1 as compared to metazoan plexins, including similar predicted conformational changes in a segment that is involved in the activation of the intracellular Ras-GAP domain. The absence of semaphorins and plexins in several choanoflagellate species did not appear to correlate with unicellular versus colonial lifestyle or ecological factors such as fresh versus salt water environment. Together, our findings support a conserved mechanism of semaphorin/plexin proteins in regulating cytoskeletal dynamics in unicellular and multicellular organisms.
Collapse
Affiliation(s)
- Chrystian Junqueira Alves
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Júlia Silva Ladeira
- Programa de Pós-graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Minas Gerais, Brazil
| | - Theodore Hannah
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Roberto J Pedroso Dias
- Departamento de Zoologia, Instituto de Ciências Biológicas, Universidade Federal de Juiz de Fora, Minas Gerais, Brazil
| | - Priscila V Zabala Capriles
- Programa de Pós-graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Minas Gerais, Brazil
| | - Karla Yotoko
- Departamento de Biologia Geral, Universidade Federal de Viçosa, Minas Gerais, Brazil
| | - Hongyan Zou
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Roland H Friedel
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
21
|
Guo Y, Wu J, Ma H, Wang S, Huang J. Comprehensive Study on Enhancing Low-Quality Position-Specific Scoring Matrix with Deep Learning for Accurate Protein Structure Property Prediction: Using Bagging Multiple Sequence Alignment Learning. J Comput Biol 2021; 28:346-361. [PMID: 33617347 DOI: 10.1089/cmb.2020.0416] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Accurate predictions of protein structure properties, for example, secondary structure and solvent accessibility, are essential in analyzing the structure and function of a protein. Position-specific scoring matrix (PSSM) features are widely used in the structure property prediction. However, some proteins may have low-quality PSSM features due to insufficient homologous sequences, leading to limited prediction accuracy. To address this limitation, we propose an enhancing scheme for PSSM features. We introduce the "Bagging MSA" (multiple sequence alignment) method to calculate PSSM features used to train our model, adopt a convolutional network to capture local context features and bidirectional long short-term memory for long-term dependencies, and integrate them under an unsupervised framework. Structure property prediction models are then built upon such enhanced PSSM features for more accurate predictions. Moreover, we develop two frameworks to evaluate the effectiveness of the enhanced PSSM features, which also bring proposed method into real-world scenarios. Empirical evaluation of CB513, CASP11, and CASP12 data sets indicates that our unsupervised enhancing scheme indeed generates more informative PSSM features for structure property prediction.
Collapse
Affiliation(s)
- Yuzhi Guo
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.,Tencent AI Lab, Shenzhen, China
| | | | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| | - Sheng Wang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| |
Collapse
|
22
|
Cueno ME, Imai K. Structural Comparison of the SARS CoV 2 Spike Protein Relative to Other Human-Infecting Coronaviruses. Front Med (Lausanne) 2021; 7:594439. [PMID: 33585502 PMCID: PMC7874069 DOI: 10.3389/fmed.2020.594439] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 12/14/2020] [Indexed: 12/19/2022] Open
Abstract
Coronaviruses (CoV) are enveloped positive-stranded RNA viruses and, historically, there are seven known human-infecting CoVs with varying degrees of virulence. CoV attachment to the host is the first step of viral pathogenesis and mainly relies on the spike glycoprotein located on the viral surface. Among the human-infecting CoVs, only the infection of SARS CoV 2 (SARS2) among humans resulted to a pandemic which would suggest that the protein structural conformation of SARS2 spike protein is distinct as compared to other human-infecting CoVs. Surprisingly, the possible differences and similarities in the protein structural conformation between the various human-infecting CoV spike proteins have not been fully elucidated. In this study, we utilized a computational approach to generate models and analyze the seven human-infecting CoV spike proteins, namely: HCoV 229E, HCoV OC43, HCoV NL63, HCoV HKU1, SARS CoV, MERS CoV, and SARS2. Model quality assessment of all CoV models generated, structural superimposition of the whole protein model and selected S1 domains (S1-CTD and S1-NTD), and structural comparison based on RMSD values, Tm scores, and contact mapping were all performed. We found that the structural orientation of S1-CTD is a potential structural feature associated to both the CoV phylogenetic cluster and lineage. Moreover, we observed that spike models in the same phylogenetic cluster or lineage could potentially have similar protein structure. Additionally, we established that there are potentially three distinct S1-CTD orientation (Pattern I, Pattern II, Pattern III) among the human-infecting CoVs. Furthermore, we postulate that human-infecting CoVs in the same phylogenetic cluster may have similar S1-CTD and S1-NTD structural orientation. Taken together, we propose that the SARS2 spike S1-CTD follows a Pattern III orientation which has a higher degree of similarity with SARS1 and some degree of similarity with both OC43 and HKU1 which coincidentally are in the same phylogenetic cluster and lineage, whereas, the SARS2 spike S1-NTD has some degree of similarity among human-infecting CoVs that are either in the same phylogenetic cluster or lineage.
Collapse
Affiliation(s)
- Marni E Cueno
- Department of Microbiology, Nihon University School of Dentistry, Tokyo, Japan
| | - Kenichi Imai
- Department of Microbiology, Nihon University School of Dentistry, Tokyo, Japan
| |
Collapse
|
23
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
24
|
Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020; 124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e., spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.
Collapse
|
25
|
Bhattacharya S, Bhattacharya D. Evaluating the significance of contact maps in low-homology protein modeling using contact-assisted threading. Sci Rep 2020; 10:2908. [PMID: 32076047 PMCID: PMC7031282 DOI: 10.1038/s41598-020-59834-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 02/04/2020] [Indexed: 12/02/2022] Open
Abstract
The development of improved threading algorithms for remote homology modeling is a critical step forward in template-based protein structure prediction. We have recently demonstrated the utility of contact information to boost protein threading by developing a new contact-assisted threading method. However, the nature and extent to which the quality of a predicted contact map impacts the performance of contact-assisted threading remains elusive. Here, we systematically analyze and explore this interdependence by employing our newly-developed contact-assisted threading method over a large-scale benchmark dataset using predicted contact maps from four complementary methods including direct coupling analysis (mfDCA), sparse inverse covariance estimation (PSICOV), classical neural network-based meta approach (MetaPSICOV), and state-of-the-art ultra-deep learning model (RaptorX). Experimental results demonstrate that contact-assisted threading using high-quality contacts having the Matthews Correlation Coefficient (MCC) ≥ 0.5 improves threading performance in nearly 30% cases, while low-quality contacts with MCC <0.35 degrades the performance for 50% cases. This holds true even in CASP13 dataset, where threading using high-quality contacts (MCC ≥ 0.5) significantly improves the performance of 22 instances out of 29. Collectively, our study uncovers the mutual association between the quality of predicted contacts and its possible utility in boosting threading performance for improving low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA.
- Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA.
| |
Collapse
|
26
|
Hattori LT, Gutoski M, Vargas Benítez CM, Nunes LF, Lopes HS. A benchmark of optimally folded protein structures using integer programming and the 3D-HP-SC model. Comput Biol Chem 2020; 84:107192. [PMID: 31918170 DOI: 10.1016/j.compbiolchem.2019.107192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 12/09/2019] [Accepted: 12/10/2019] [Indexed: 01/04/2023]
Abstract
The Protein Structure Prediction (PSP) problem comprises, among other issues, forecasting the three-dimensional native structure of proteins using only their primary structure information. Most computational studies in this area use synthetic data instead of real biological data. However, the closer to the real-world, the more the impact of results and their applicability. This work presents 17 real protein sequences extracted from the Protein Data Bank for a benchmark to the PSP problem using the tri-dimensional Hydrophobic-Polar with Side-Chains model (3D-HP-SC). The native structure of these proteins was found by maximizing the number of hydrophobic contacts between the side-chains of amino acids. The problem was treated as an optimization problem and solved by means of an Integer Programming approach. Although the method optimally solves the problem, the processing time has an exponential trend. Therefore, due to computational limitations, the method is a proof-of-concept and it is not applicable to large sequences. For unknown sequences, an upper bound of the number of hydrophobic contacts (using this model) can be found, due to a linear relationship with the number of hydrophobic residues. The comparison between the predicted and the biological structures showed that the highest similarity between them was found with distance thresholds around 5.2-8.2 Å. Both the dataset and the programs developed will be freely available to foster further research in the area.
Collapse
Affiliation(s)
- Leandro Takeshi Hattori
- Bioinformatics and Computational Intelligence Laboratory, Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba (PR), Brazil.
| | - Matheus Gutoski
- Bioinformatics and Computational Intelligence Laboratory, Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba (PR), Brazil
| | - César Manuel Vargas Benítez
- Bioinformatics and Computational Intelligence Laboratory, Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba (PR), Brazil
| | - Luiz Fernando Nunes
- Bioinformatics and Computational Intelligence Laboratory, Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba (PR), Brazil.
| | - Heitor Silvério Lopes
- Bioinformatics and Computational Intelligence Laboratory, Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba (PR), Brazil.
| |
Collapse
|
27
|
Fukuda H, Tomii K. DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment. BMC Bioinformatics 2020; 21:10. [PMID: 31918654 PMCID: PMC6953294 DOI: 10.1186/s12859-019-3190-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/04/2019] [Indexed: 12/30/2022] Open
Abstract
Background Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction. Results We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors. Conclusions The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.
Collapse
Affiliation(s)
- Hiroyuki Fukuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken, 277-8562, Japan
| | - Kentaro Tomii
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken, 277-8562, Japan. .,Artificial Intelligence Research Center (AIRC), Biotechnology Research Institute for Drug Discovery, Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| |
Collapse
|
28
|
Srivastava S, Verma S, Kamthania M, Kaur R, Badyal RK, Saxena AK, Shin HJ, Kolbe M, Pandey KC. Structural Basis for Designing Multiepitope Vaccines Against COVID-19 Infection: In Silico Vaccine Design and Validation. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2020; 1:e19371. [PMID: 32776022 PMCID: PMC7370533 DOI: 10.2196/19371] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/26/2020] [Accepted: 05/27/2020] [Indexed: 11/13/2022]
Abstract
BACKGROUND The novel coronavirus disease (COVID-19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to the ongoing 2019-2020 pandemic. SARS-CoV-2 is a positive-sense single-stranded RNA coronavirus. Effective countermeasures against SARS-CoV-2 infection require the design and development of specific and effective vaccine candidates. OBJECTIVE To address the urgent need for a SARS-CoV-2 vaccine, in the present study, we designed and validated one cytotoxic T lymphocyte (CTL) and one helper T lymphocyte (HTL) multi-epitope vaccine (MEV) against SARS-CoV-2 using various in silico methods. METHODS Both designed MEVs are composed of CTL and HTL epitopes screened from 11 Open Reading Frame (ORF), structural and nonstructural proteins of the SARS-CoV-2 proteome. Both MEVs also carry potential B-cell linear and discontinuous epitopes as well as interferon gamma-inducing epitopes. To enhance the immune response of our vaccine design, truncated (residues 10-153) Onchocerca volvulus activation-associated secreted protein-1 was used as an adjuvant at the N termini of both MEVs. The tertiary models for both the designed MEVs were generated, refined, and further analyzed for stable molecular interaction with toll-like receptor 3. Codon-biased complementary DNA (cDNA) was generated for both MEVs and analyzed in silico for high level expression in a mammalian (human) host cell line. RESULTS In the present study, we screened and shortlisted 38 CTL, 33 HTL, and 12 B cell epitopes from the 11 ORF protein sequences of the SARS-CoV-2 proteome. Moreover, the molecular interactions of the screened epitopes with their respective human leukocyte antigen allele binders and the transporter associated with antigen processing (TAP) complex were positively validated. The shortlisted screened epitopes were utilized to design two novel MEVs against SARS-CoV-2. Further molecular models of both MEVs were prepared, and their stable molecular interactions with toll-like receptor 3 were positively validated. The codon-optimized cDNAs of both MEVs were also positively analyzed for high levels of overexpression in a human cell line. CONCLUSIONS The present study is highly significant in terms of the molecular design of prospective CTL and HTL vaccines against SARS-CoV-2 infection with potential to elicit cellular and humoral immune responses. The epitopes of the designed MEVs are predicted to cover the large human population worldwide (96.10%). Hence, both designed MEVs could be tried in vivo as potential vaccine candidates against SARS-CoV-2.
Collapse
Affiliation(s)
- Sukrit Srivastava
- Infection Biology Group, Department of Biotechnology, Mangalayatan University, Aligarh, India
- Molecular Medicine Laboratory, School of Life Science, Jawaharlal Nehru University, New Delhi, India
| | - Sonia Verma
- Parasite-Host Biology Group, Protein Biochemistry and Engineering Lab, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Mohit Kamthania
- Department of Biotechnology, Institute of Applied Medicines and Research, Ghaziabad, India
| | - Rupinder Kaur
- Department of Chemistry, Guru Nanak Dev University, Amritsar, India
| | | | - Ajay Kumar Saxena
- Molecular Medicine Laboratory, School of Life Science, Jawaharlal Nehru University, New Delhi, India
| | - Ho-Joon Shin
- Department of Microbiology, School of Medicine, Ajou University, Suwon, Gyeonggi-do, Republic of Korea
| | - Michael Kolbe
- Centre for Structural Systems Biology, Department for Structural Infection Biology, Helmholtz-Centre for Infection Research, Hamburg, Germany
- Faculty of Mathematics, Informatics and Natural Sciences, University of Hamburg, Hamburg, Germany
| | - Kailash C Pandey
- Parasite-Host Biology Group, Protein Biochemistry and Engineering Lab, ICMR-National Institute of Malaria Research, New Delhi, India
| |
Collapse
|
29
|
Zhang H, Zhang Q, Ju F, Zhu J, Gao Y, Xie Z, Deng M, Sun S, Zheng WM, Bu D. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. BMC Bioinformatics 2019; 20:537. [PMID: 31664895 PMCID: PMC6821021 DOI: 10.1186/s12859-019-3051-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 08/22/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. RESULTS In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. CONCLUSIONS Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts.
Collapse
Affiliation(s)
- Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Qi Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yujuan Gao
- Center for Quantitative Biology, School of Mathematical Sciences, Center for Statistical Sciences, Peking University, Beijing, China
| | - Ziwei Xie
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Minghua Deng
- Center for Quantitative Biology, School of Mathematical Sciences, Center for Statistical Sciences, Peking University, Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
30
|
Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y. Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 2019; 34:4039-4045. [PMID: 29931279 DOI: 10.1093/bioinformatics/bty481] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 06/13/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Accurate prediction of a protein contact map depends greatly on capturing as much contextual information as possible from surrounding residues for a target residue pair. Recently, ultra-deep residual convolutional networks were found to be state-of-the-art in the latest Critical Assessment of Structure Prediction techniques (CASP12) for protein contact map prediction by attempting to provide a protein-wide context at each residue pair. Recurrent neural networks have seen great success in recent protein residue classification problems due to their ability to propagate information through long protein sequences, especially Long Short-Term Memory (LSTM) cells. Here, we propose a novel protein contact map prediction method by stacking residual convolutional networks with two-dimensional residual bidirectional recurrent LSTM networks, and using both one-dimensional sequence-based and two-dimensional evolutionary coupling-based information. Results We show that the proposed method achieves a robust performance over validation and independent test sets with the Area Under the receiver operating characteristic Curve (AUC) > 0.95 in all tests. When compared to several state-of-the-art methods for independent testing of 228 proteins, the method yields an AUC value of 0.958, whereas the next-best method obtains an AUC of 0.909. More importantly, the improvement is over contacts at all sequence-position separations. Specifically, a 8.95%, 5.65% and 2.84% increase in precision were observed for the top L∕10 predictions over the next best for short, medium and long-range contacts, respectively. This confirms the usefulness of ResNets to congregate the short-range relations and 2D-BRLSTM to propagate the long-range dependencies throughout the entire protein contact map 'image'. Availability and implementation SPOT-Contact server url: http://sparks-lab.org/jack/server/SPOT-Contact/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
- School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| |
Collapse
|
31
|
Abstract
Motivation Template-based modeling, including homology modeling and protein threading, is a popular method for protein 3D structure prediction. However, alignment generation and template selection for protein sequences without close templates remain very challenging. Results We present a new method called DeepThreader to improve protein threading, including both alignment generation and template selection, by making use of deep learning (DL) and residue co-variation information. Our method first employs DL to predict inter-residue distance distribution from residue co-variation and sequential information (e.g. sequence profile and predicted secondary structure), and then builds sequence-template alignment by integrating predicted distance information and sequential features through an ADMM algorithm. Experimental results suggest that predicted inter-residue distance is helpful to both protein alignment and template selection especially for protein sequences without very close templates, and that our method outperforms currently popular homology modeling method HHpred and threading method CNFpred by a large margin and greatly outperforms the latest contact-assisted protein threading method EigenTHREADER. Availability and implementation http://raptorx.uchicago.edu/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianwei Zhu
- Toyota Technological Institute, Chicago, IL, USA.,Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Sheng Wang
- Toyota Technological Institute, Chicago, IL, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute, Chicago, IL, USA
| |
Collapse
|
32
|
Wang C, Wei Y, Zhang H, Kong L, Sun S, Zheng WM, Bu D. Constructing effective energy functions for protein structure prediction through broadening attraction-basin and reverse Monte Carlo sampling. BMC Bioinformatics 2019; 20:135. [PMID: 30925867 PMCID: PMC6439974 DOI: 10.1186/s12859-019-2652-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ab initio approaches to protein structure prediction usually employ the Monte Carlo technique to search the structural conformation that has the lowest energy. However, the widely-used energy functions are usually ineffective for conformation search. How to construct an effective energy function remains a challenging task. RESULTS Here, we present a framework to construct effective energy functions for protein structure prediction. Unlike existing energy functions only requiring the native structure to be the lowest one, we attempt to maximize the attraction-basin where the native structure lies in the energy landscape. The underlying rationale is that each energy function determines a specific energy landscape together with a native attraction-basin, and the larger the attraction-basin is, the more likely for the Monte Carlo search procedure to find the native structure. Following this rationale, we constructed effective energy functions as follows: i) To explore the native attraction-basin determined by a certain energy function, we performed reverse Monte Carlo sampling starting from the native structure, identifying the structural conformations on the edge of attraction-basin. ii) To broaden the native attraction-basin, we smoothened the edge points of attraction-basin through tuning weights of energy terms, thus acquiring an improved energy function. Our framework alternates the broadening attraction-basin and reverse sampling steps (thus called BARS) until the native attraction-basin is sufficiently large. We present extensive experimental results to show that using the BARS framework, the constructed energy functions could greatly facilitate protein structure prediction in improving the quality of predicted structures and speeding up conformation search. CONCLUSION Using the BARS framework, we constructed effective energy functions for protein structure prediction, which could improve the quality of predicted structures and speed up conformation search as well.
Collapse
Affiliation(s)
- Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Yi Wei
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
- Institute of Theoretical Physics, Chinese Academy of Sciences, 55, Zhongguancun East Road, Beijing, 100190 China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| |
Collapse
|
33
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
34
|
DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019; 9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open
Abstract
The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.
Collapse
|
35
|
Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019; 19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open
Abstract
Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.
Collapse
Affiliation(s)
- Qiqige Wuyun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wei Zheng
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
36
|
Rosales Ramirez R, Ludert JE. The Dengue Virus Nonstructural Protein 1 (NS1) Is Secreted from Mosquito Cells in Association with the Intracellular Cholesterol Transporter Chaperone Caveolin Complex. J Virol 2019; 93:e01985-18. [PMID: 30463973 PMCID: PMC6364000 DOI: 10.1128/jvi.01985-18] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 11/10/2018] [Indexed: 12/16/2022] Open
Abstract
Dengue virus (DENV) is a mosquito-borne virus of the family Flaviviridae The RNA viral genome encodes three structural and seven nonstructural proteins. Nonstructural protein 1 (NS1) is a multifunctional protein actively secreted in vertebrate and mosquito cells during infection. In mosquito cells, NS1 is secreted in a caveolin-1-dependent manner by an unconventional route. The caveolin chaperone complex (CCC) is a cytoplasmic complex formed by caveolin-1 and the chaperones FKBP52, Cy40, and CyA and is responsible for the cholesterol traffic inside the cell. In this work, we demonstrate that in mosquito cells, but not in vertebrate cells, NS1 associates with and relies on the CCC for secretion. Treatment of mosquito cells with classic secretion inhibitors, such as brefeldin A, Golgicide A, and Fli-06, showed no effect on NS1 secretion but significant reductions in recombinant luciferase secretion and virion release. Silencing the expression of CAV-1 or FKBP52 with short interfering RNAs or the inhibition of CyA by cyclosporine resulted in significant decrease in NS1 secretion, again without affecting virion release. Colocalization, coimmunoprecipitation, and proximity ligation assays indicated that NS1 colocalizes and interacts with all proteins of the CCC. In addition, CAV-1 and FKBP52 expression was found augmented in DENV-infected cells. Results obtained with Zika virus-infected cells suggest that in mosquito cells, ZIKV NS1 follows the same secretory pathway as that observed for DENV NS1. These results uncover important differences in the dengue virus-cell interactions between the vertebrate host and the mosquito vector as well as novel functions for the chaperone caveolin complex.IMPORTANCE The dengue virus protein NS1 is secreted efficiently from both infected vertebrate and mosquito cells. Previously, our group reported that NS1 secretion in mosquito cells follows an unconventional secretion pathway dependent on caveolin-1. In this work, we demonstrate that in mosquito cells, but not in vertebrate cells, NS1 secretion takes place in association with the chaperone caveolin complex, a complex formed by caveolin-1 and the chaperones FKBP52, CyA, and Cy40, which are in charge of cholesterol transport inside the cell. Results obtained with ZIKV-infected mosquito cells suggest that ZIKV NS1 is released following an unconventional secretory route in association with the chaperone caveolin complex. These results uncover important differences in the virus-cell interactions between the vertebrate host and the mosquito vector, as well as novel functions for the chaperone caveolin complex. Moreover, manipulation of the NS1 secretory route may prove a valuable strategy to combat these two mosquito-borne diseases.
Collapse
Affiliation(s)
- Romel Rosales Ramirez
- Department of Infectomics and Molecular Pathogenesis, Center for Research and Advanced Studies (CINVESTAV-IPN), Mexico City, Mexico
| | - Juan E Ludert
- Department of Infectomics and Molecular Pathogenesis, Center for Research and Advanced Studies (CINVESTAV-IPN), Mexico City, Mexico
| |
Collapse
|
37
|
Ding W, Mao W, Shao D, Zhang W, Gong H. DeepConPred2: An Improved Method for the Prediction of Protein Residue Contacts. Comput Struct Biotechnol J 2018; 16:503-510. [PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/16/2018] [Accepted: 10/18/2018] [Indexed: 12/18/2022] Open
Abstract
Information of residue-residue contacts is essential for understanding the mechanism of protein folding, and has been successfully applied as special topological restraints to simplify the conformational sampling in de novo protein structure prediction. Prediction of protein residue contacts has experienced amazingly rapid progresses recently, with prediction accuracy approaching impressively high levels in the past two years. In this work, we introduce a second version of our residue contact predictor, DeepConPred2, which exhibits substantially improved performance and sufficiently reduced running time after model re-optimization and feature updates. When testing on the CASP12 free modeling targets, our program reaches at least the same level of prediction accuracy as the best contact predictors so far and provides information complementary to other state-of-the-art methods in contact-assisted folding.
Collapse
Affiliation(s)
- Wenze Ding
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Di Shao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenxuan Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
38
|
Chakravorty D, Patra S. RankProt: A multi criteria-ranking platform to attain protein thermostabilizing mutations and its in vitro applications - Attribute based prediction method on the principles of Analytical Hierarchical Process. PLoS One 2018; 13:e0203036. [PMID: 30286107 PMCID: PMC6171822 DOI: 10.1371/journal.pone.0203036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 08/14/2018] [Indexed: 01/15/2023] Open
Abstract
Attaining recombinant thermostable proteins is still a challenge for protein engineering. The complexity is the length of time and enormous efforts required to achieve the desired results. Present work proposes a novel and economic strategy of attaining protein thermostability by predicting site-specific mutations at the shortest possible time. The success of the approach can be attributed to Analytical Hierarchical Process and the outcome was a rationalized thermostable mutation(s) prediction tool- RankProt. Briefly the method involved ranking of 17 biophysical protein features as class predictors, derived from 127 pairs of thermostable and mesostable proteins. Among the 17 predictors, ionic interactions and main-chain to main-chain hydrogen bonds were the highest ranked features with eigen value of 0.091. The success of the tool was judged by multi-fold in silico validation tests and it achieved the prediction accuracy of 91% with AUC 0.927. Further, in vitro validation was carried out by predicting thermostabilizing mutations for mesostable Bacillus subtilis lipase and performing the predicted mutations by multi-site directed mutagenesis. The rationalized method was successful to render the lipase thermostable with optimum temperature stability and Tm increase by 20°C and 7°C respectively. Conclusively it can be said that it was the minimum number of mutations in comparison to the number of mutations incorporated to render Bacillus subtilis lipase thermostable, by directed evolution techniques. The present work shows that protein stabilizing mutations can be rationally designed by balancing the biophysical pleiotropy of proteins, in accordance to the selection pressure.
Collapse
Affiliation(s)
- Debamitra Chakravorty
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India
| | - Sanjukta Patra
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India
- * E-mail:
| |
Collapse
|
39
|
Wozniak PP, Konopka BM, Xu J, Vriend G, Kotulska M. Forecasting residue-residue contact prediction accuracy. Bioinformatics 2017; 33:3405-3414. [PMID: 29036497 DOI: 10.1093/bioinformatics/btx416] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/22/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein. Results We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX. Availability and implementation All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/. Contact malgorzata.kotulska@pwr.edu.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - B M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, GA 6525, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
40
|
Adhikari B, Hou J, Cheng J. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning. Proteins 2017; 86 Suppl 1:84-96. [PMID: 29047157 DOI: 10.1002/prot.25405] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 09/08/2017] [Accepted: 10/16/2017] [Indexed: 12/14/2022]
Abstract
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, Missouri
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| |
Collapse
|
41
|
Wang S, Li Z, Yu Y, Xu J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst 2017; 5:202-211.e3. [PMID: 28957654 PMCID: PMC5637520 DOI: 10.1016/j.cels.2017.09.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/01/2017] [Accepted: 08/29/2017] [Indexed: 01/02/2023]
Abstract
Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here, we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-MPs and then predicts 3D structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs, and generates 3D models with root-mean-square deviation (RMSD) less than 4 and 5 Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation project shows that our method predicted high-resolution 3D models for two recent test MPs of 210 residues with RMSD ∼2 Å. We estimated that our method could predict correct folds for 1,345-1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Computer Science, University of Hong Kong, Hong Kong
| | - Yizhou Yu
- Department of Computer Science, University of Hong Kong, Hong Kong
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
42
|
Sabzekar M, Naghibzadeh M, Eghdami M, Aydin Z. Protein β-sheet prediction using an efficient dynamic programming algorithm. Comput Biol Chem 2017; 70:142-155. [PMID: 28881217 DOI: 10.1016/j.compbiolchem.2017.08.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 07/25/2017] [Accepted: 08/18/2017] [Indexed: 11/28/2022]
Abstract
Predicting the β-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in β-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all β-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of β-strands. Additionally, brute-force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate β-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict β-sheet structures with high number of β-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art β-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar.
Collapse
Affiliation(s)
- Mostafa Sabzekar
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahdie Eghdami
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Zafer Aydin
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
43
|
Wang S, Sun S, Xu J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 2017; 86 Suppl 1:67-77. [PMID: 28845538 DOI: 10.1002/prot.25377] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 08/18/2017] [Accepted: 08/25/2017] [Indexed: 11/08/2022]
Abstract
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Siqi Sun
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
44
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
45
|
Stahl K, Schneider M, Brock O. EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinformatics 2017; 18:303. [PMID: 28623886 PMCID: PMC5474060 DOI: 10.1186/s12859-017-1713-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/.
Collapse
Affiliation(s)
- Kolja Stahl
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| |
Collapse
|
46
|
Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017; 33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
47
|
Simkovic F, Ovchinnikov S, Baker D, Rigden DJ. Applications of contact predictions to structural biology. IUCRJ 2017; 4:291-300. [PMID: 28512576 PMCID: PMC5414403 DOI: 10.1107/s2052252517005115] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/03/2017] [Indexed: 06/07/2023]
Abstract
Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallo-graphy, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations.
Collapse
Affiliation(s)
- Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195, USA
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
48
|
Chapman SD, Adami C, Wilke CO, B Kc D. The evolution of logic circuits for the purpose of protein contact map prediction. PeerJ 2017; 5:e3139. [PMID: 28439455 PMCID: PMC5398280 DOI: 10.7717/peerj.3139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 03/02/2017] [Indexed: 11/20/2022] Open
Abstract
Predicting protein structure from sequence remains a major open problem in protein biochemistry. One component of predicting complete structures is the prediction of inter-residue contact patterns (contact maps). Here, we discuss protein contact map prediction by machine learning. We describe a novel method for contact map prediction that uses the evolution of logic circuits. These logic circuits operate on feature data and output whether or not two amino acids in a protein are in contact or not. We show that such a method is feasible, and in addition that evolution allows the logic circuits to be trained on the dataset in an unbiased manner so that it can be used in both contact map prediction and the selection of relevant features in a dataset.
Collapse
Affiliation(s)
- Samuel D Chapman
- Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Christoph Adami
- Department of Microbiology and Molecular Genetics and Department of Physics and Astronomy, Michigan State University, East Lansing, MI, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Dukka B Kc
- Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| |
Collapse
|
49
|
Huang W, Zeng X, Shi Y, Liu M. Functional characterization of human equilibrative nucleoside transporter 1. Protein Cell 2017; 8:284-295. [PMID: 27995448 PMCID: PMC5359181 DOI: 10.1007/s13238-016-0350-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 11/04/2016] [Indexed: 12/15/2022] Open
Abstract
Equilibrative nucleoside transporters (ENTs), which facilitate cross-membrane transport of nucleosides and nucleoside-derived drugs, play an important role in the salvage pathways of nucleotide synthesis, cancer chemotherapy, and treatment for virus infections. Functional characterization of ENTs at the molecular level remains technically challenging and hence scant. In this study, we report successful purification and biochemical characterization of human equilibrative nucleoside transporter 1 (hENT1) in vitro. The HEK293F-derived, recombinant hENT1 is homogenous and functionally active in proteoliposome-based counter flow assays. hENT1 transports the substrate adenosine with a Km of 215 ± 34 µmol/L and a Vmax of 578 ± 23.4 nmol mg-1 min-1. Adenosine uptake by hENT1 is competitively inhibited by nitrobenzylmercaptopurine ribonucleoside (NBMPR), nucleosides, deoxynucleosides, and nucleoside-derived anti-cancer and anti-viral drugs. Binding of hENT1 to adenosine, deoxyadenosine, and adenine by isothermal titration calorimetry is in general agreement with results of the competitive inhibition assays. These results validate hENT1 as a bona fide target for potential drug target and serve as a useful basis for future biophysical and structural studies.
Collapse
Affiliation(s)
- Weiyun Huang
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xin Zeng
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yigong Shi
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Minhao Liu
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
50
|
Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 2017; 13:e1005324. [PMID: 28056090 PMCID: PMC5249242 DOI: 10.1371/journal.pcbi.1005324] [Citation(s) in RCA: 589] [Impact Index Per Article: 73.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/20/2017] [Accepted: 12/20/2016] [Indexed: 12/02/2022] Open
Abstract
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ Protein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurrence patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained mostly with soluble proteins, our method performs very well on membrane proteins. Recent blind CAMEO test confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Siqi Sun
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Renyu Zhang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|