101
|
Chen T. Identification and characterization of the LRR repeats in plant LRR-RLKs. BMC Mol Cell Biol 2021; 22:9. [PMID: 33509084 PMCID: PMC7841916 DOI: 10.1186/s12860-021-00344-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 01/12/2021] [Indexed: 01/11/2023] Open
Abstract
Background Leucine-rich-repeat receptor-like kinases (LRR-RLKs) play central roles in sensing various signals to regulate plant development and environmental responses. The extracellular domains (ECDs) of plant LRR-RLKs contain LRR motifs, consisting of highly conserved residues and variable residues, and are responsible for ligand perception as a receptor or co-receptor. However, there are few comprehensive studies on the ECDs of LRR-RLKs due to the difficulty in effectively identifying the divergent LRR repeats. Results In the current study, an efficient LRR motif prediction program, the “Phyto-LRR prediction” program, was developed based on the position-specific scoring matrix algorithm (PSSM) with some optimizations. This program was trained by 16-residue plant-specific LRR-highly conserved segments (HCS) from LRR-RLKs of 17 represented land plant species and a database containing more than 55,000 predicted LRRs based on this program was constructed. Both the prediction tool and database are freely available at http://phytolrr.com/ for website usage and at http://github.com/phytolrr for local usage. The LRR-RLKs were classified into 18 subgroups (SGs) according to the maximum-likelihood phylogenetic analysis of kinase domains (KDs) of the sequences. Based on the database and the SGs, the characteristics of the LRR motifs in the ECDs of the LRR-RLKs were examined, such as the arrangement of the LRRs, the solvent accessibility, the variable residues, and the N-glycosylation sites, revealing a comprehensive profile of the plant LRR-RLK ectodomains. Conclusion The “Phyto-LRR prediction” program is effective in predicting the LRR segments in plant LRR-RLKs, which, together with the database, will facilitate the exploration of plant LRR-RLKs functions. Based on the database, comprehensive sequential characteristics of the plant LRR-RLK ectodomains were profiled and analyzed. Supplementary Information The online version contains supplementary material available at 10.1186/s12860-021-00344-y.
Collapse
Affiliation(s)
- Tianshu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Ave, Nanjing, 210046, China.
| |
Collapse
|
102
|
Bhagwat NR, Owens SN, Ito M, Boinapalli JV, Poa P, Ditzel A, Kopparapu S, Mahalawat M, Davies OR, Collins SR, Johnson JR, Krogan NJ, Hunter N. SUMO is a pervasive regulator of meiosis. eLife 2021; 10:57720. [PMID: 33502312 PMCID: PMC7924959 DOI: 10.7554/elife.57720] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 01/26/2021] [Indexed: 02/06/2023] Open
Abstract
Protein modification by SUMO helps orchestrate the elaborate events of meiosis to faithfully produce haploid gametes. To date, only a handful of meiotic SUMO targets have been identified. Here, we delineate a multidimensional SUMO-modified meiotic proteome in budding yeast, identifying 2747 conjugation sites in 775 targets, and defining their relative levels and dynamics. Modified sites cluster in disordered regions and only a minority match consensus motifs. Target identities and modification dynamics imply that SUMOylation regulates all levels of chromosome organization and each step of meiotic prophase I. Execution-point analysis confirms these inferences, revealing functions for SUMO in S-phase, the initiation of recombination, chromosome synapsis and crossing over. K15-linked SUMO chains become prominent as chromosomes synapse and recombine, consistent with roles in these processes. SUMO also modifies ubiquitin, forming hybrid oligomers with potential to modulate ubiquitin signaling. We conclude that SUMO plays diverse and unanticipated roles in regulating meiotic chromosome metabolism. Most mammalian, yeast and other eukaryote cells have two sets of chromosomes, one from each parent, which contain all the cell’s DNA. Sex cells – like the sperm and egg – however, have half the number of chromosomes and are formed by a specialized type of cell division known as meiosis. At the start of meiosis, each cell replicates its chromosomes so that it has twice the amount of DNA. The cell then undergoes two rounds of division to form sex cells which each contain only one set of chromosomes. Before the cell divides, the two duplicated sets of chromosomes pair up and swap sections of their DNA. This exchange allows each new sex cell to have a unique combination of DNA, resulting in offspring that are genetically distinct from their parents. This complex series of events is tightly regulated, in part, by a protein called the 'small ubiquitin-like modifier' (or SUMO for short), which attaches itself to other proteins and modifies their behavior. This process, known as SUMOylation, can affect a protein’s stability, where it is located in the cell and how it interacts with other proteins. However, despite SUMO being known as a key regulator of meiosis, only a handful of its protein targets have been identified. To gain a better understanding of what SUMO does during meiosis, Bhagwat et al. set out to find which proteins are targeted by SUMO in budding yeast and to map the specific sites of modification. The experiments identified 2,747 different sites on 775 different proteins, suggesting that SUMO regulates all aspects of meiosis. Consistently, inactivating SUMOylation at different times revealed SUMO plays a role at every stage of meiosis, including the replication of DNA and the exchanges between chromosomes. In depth analysis of the targeted proteins also revealed that SUMOylation targets different groups of proteins at different stages of meiosis and interacts with other protein modifications, including the ubiquitin system which tags proteins for destruction. The data gathered by Bhagwat et al. provide a starting point for future research into precisely how SUMO proteins control meiosis in yeast and other organisms. In humans, errors in meiosis are the leading cause of pregnancy loss and congenital diseases. Most of the proteins identified as SUMO targets in budding yeast are also present in humans. So, this research could provide a platform for medical advances in the future. The next step is to study mammalian models, such as mice, to confirm that the regulation of meiosis by SUMO is the same in mammals as in yeast.
Collapse
Affiliation(s)
- Nikhil R Bhagwat
- Howard Hughes Medical Institute, University of California Davis, Davis, United States.,Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Shannon N Owens
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Masaru Ito
- Howard Hughes Medical Institute, University of California Davis, Davis, United States.,Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Jay V Boinapalli
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Philip Poa
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Alexander Ditzel
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Srujan Kopparapu
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Meghan Mahalawat
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Owen Richard Davies
- Institute for Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne, United Kingdom
| | - Sean R Collins
- Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States
| | - Jeffrey R Johnson
- Department of Cellular & Molecular Pharmacology, University of California San Francisco, San Francisco, United States
| | - Nevan J Krogan
- Department of Cellular & Molecular Pharmacology, University of California San Francisco, San Francisco, United States
| | - Neil Hunter
- Howard Hughes Medical Institute, University of California Davis, Davis, United States.,Department of Microbiology & Molecular Genetics, University of California Davis, Davis, United States.,Department of Molecular & Cellular Biology, University of California Davis, Davis, United States
| |
Collapse
|
103
|
Zhang J, Chen Q, Liu B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Brief Bioinform 2021; 22:6102667. [PMID: 33454744 DOI: 10.1093/bib/bbaa397] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/05/2020] [Accepted: 12/03/2020] [Indexed: 01/01/2023] Open
Abstract
The interactions between proteins and nucleic acid sequences play many important roles in gene expression and some cellular activities. Accurate prediction of the nucleic acid binding residues in proteins will facilitate the research of the protein functions, gene expression, drug design, etc. In this regard, several computational methods have been proposed to predict the nucleic acid binding residues in proteins. However, these methods cannot satisfactorily measure the global interactions among the residues along protein. Furthermore, these methods are suffering cross-prediction problem, new strategies should be explored to solve this problem. In this study, a new computational method called NCBRPred was proposed to predict the nucleic acid binding residues based on the multilabel sequence labeling model. NCBRPred used the bidirectional Gated Recurrent Units (BiGRUs) to capture the global interactions among the residues, and treats this task as a multilabel learning task. Experimental results on three widely used benchmark datasets and an independent dataset showed that NCBRPred achieved higher predictive results with lower cross-prediction, outperforming 10 existing state-of-the-art predictors. The web-server and a stand-alone package of NCBRPred are freely available at http://bliulab.net/NCBRPred. It is anticipated that NCBRPred will become a very useful tool for identifying nucleic acid binding residues.
Collapse
Affiliation(s)
- Jun Zhang
- Computer Science and Technology with Harbin Institute of Technology, Shenzhen, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
104
|
Bernier SC, Millette MA, Roy S, Cantin L, Coutinho A, Salesse C. Structural information and membrane binding of truncated RGS9-1 Anchor Protein and its C-terminal hydrophobic segment. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2021; 1863:183566. [PMID: 33453187 DOI: 10.1016/j.bbamem.2021.183566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 12/22/2020] [Accepted: 01/10/2021] [Indexed: 01/19/2023]
Abstract
Visual phototransduction takes place in photoreceptor cells. Light absorption by rhodopsin leads to the activation of transducin as a result of the exchange of its GDP for GTP. The GTP-bound ⍺-subunit of transducin then activates phosphodiesterase (PDE), which in turn hydrolyzes cGMP leading to photoreceptor hyperpolarization. Photoreceptors return to the dark state upon inactivation of these proteins. In particular, PDE is inactivated by the protein complex R9AP/RGS9-1/Gβ5. R9AP (RGS9-1 anchor protein) is responsible for the membrane anchoring of this protein complex to photoreceptor outer segment disk membranes most likely by the combined involvement of its C-terminal hydrophobic domain as well as other types of interactions. This study thus aimed to gather information on the structure and membrane binding of the C-terminal hydrophobic segment of R9AP as well as of truncated R9AP (without its C-terminal domain, R9AP∆TM). Circular dichroism and infrared spectroscopic measurements revealed that the secondary structure of R9AP∆TM mainly includes ⍺-helical structural elements. Moreover, intrinsic fluorescence measurements of native R9AP∆TM and individual mutants lacking one tryptophan demonstrated that W79 is more buried than W173 but that they are both located in a hydrophobic environment. This method also revealed that membrane binding of R9AP∆TM does not involve regions near its tryptophan residues, while infrared spectroscopy validated its binding to lipid vesicles. Additional fluorescence measurements showed that the C-terminal segment of R9AP is membrane embedded. Maximum insertion pressure and synergy data using Langmuir monolayers suggest that interactions with specific phospholipids could be involved in the membrane binding of R9AP∆TM.
Collapse
Affiliation(s)
- Sarah C Bernier
- CUO-Recherche, Centre de recherche du CHU de Québec and Département d'ophtalmologie, Faculté de Médecine, and Regroupement Stratégique PROTEO, Université Laval, Québec, Québec, Canada
| | - Marc-Antoine Millette
- CUO-Recherche, Centre de recherche du CHU de Québec and Département d'ophtalmologie, Faculté de Médecine, and Regroupement Stratégique PROTEO, Université Laval, Québec, Québec, Canada
| | - Sarah Roy
- CUO-Recherche, Centre de recherche du CHU de Québec and Département d'ophtalmologie, Faculté de Médecine, and Regroupement Stratégique PROTEO, Université Laval, Québec, Québec, Canada
| | - Line Cantin
- CUO-Recherche, Centre de recherche du CHU de Québec and Département d'ophtalmologie, Faculté de Médecine, and Regroupement Stratégique PROTEO, Université Laval, Québec, Québec, Canada
| | - Ana Coutinho
- iBB-Institute for Bioengineering and Biosciences, Department of Bioengineering, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001 Lisboa, Portugal; Department of Chemistry and Biochemistry, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
| | - Christian Salesse
- CUO-Recherche, Centre de recherche du CHU de Québec and Département d'ophtalmologie, Faculté de Médecine, and Regroupement Stratégique PROTEO, Université Laval, Québec, Québec, Canada.
| |
Collapse
|
105
|
Karimi M, Wu D, Wang Z, Shen Y. Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts. J Chem Inf Model 2020; 61:46-66. [PMID: 33347301 DOI: 10.1021/acs.jcim.0c00866] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Predicting compound-protein affinity is beneficial for accelerating drug discovery. Doing so without the often-unavailable structure data is gaining interest. However, recent progress in structure-free affinity prediction, made by machine learning, focuses on accuracy but leaves much to be desired for interpretability. Defining intermolecular contacts underlying affinities as a vehicle for interpretability; our large-scale interpretability assessment finds previously used attention mechanisms inadequate. We thus formulate a hierarchical multiobjective learning problem, where predicted contacts form the basis for predicted affinities. We solve the problem by embedding protein sequences (by hierarchical recurrent neural networks) and compound graphs (by graph neural networks) with joint attentions between protein residues and compound atoms. We further introduce three methodological advances to enhance interpretability: (1) structure-aware regularization of attentions using protein sequence-predicted solvent exposure and residue-residue contact maps; (2) supervision of attentions using known intermolecular contacts in training data; and (3) an intrinsically explainable architecture where atomic-level contacts or "relations" lead to molecular-level affinity prediction. The first two and all three advances result in DeepAffinity+ and DeepRelations, respectively. Our methods show generalizability in affinity prediction for molecules that are new and dissimilar to training examples. Moreover, they show superior interpretability compared to state-of-the-art interpretable methods: with similar or better affinity prediction, they boost the AUPRC of contact prediction by around 33-, 35-, 10-, and 9-fold for the default test, new-compound, new-protein, and both-new sets, respectively. We further demonstrate their potential utilities in contact-assisted docking, structure-free binding site prediction, and structure-activity relationship studies without docking. Our study represents the first model development and systematic model assessment dedicated to interpretable machine learning for structure-free compound-protein affinity prediction.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Di Wu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Zhangyang Wang
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas 77843, United States.,Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
106
|
Phylogenomic analyses recover a clade of large-bodied decapodiform cephalopods. Mol Phylogenet Evol 2020; 156:107038. [PMID: 33285289 DOI: 10.1016/j.ympev.2020.107038] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 10/30/2020] [Accepted: 12/01/2020] [Indexed: 12/14/2022]
Abstract
Phylogenetic relationships among the squids and cuttlefishes (Cephalopoda:Decapodiformes) have resisted clarification for decades, despite multiple analyses of morphological, molecular and combined data sets. More recently, analyses of complete mitochondrial genomes and hundreds of nuclear loci have yielded similarly ambiguous results. In this study, we re-evaluate hypotheses of decapodiform relationships by increasing taxonomic breadth and utilizing higher-quality genome and transcriptome data for several taxa. We also employ analytical approaches to (1) identify contamination in transcriptome data, (2) better assess model adequacy, and (3) account for potential biases. Using this larger data set, we consistently recover a clade comprising Myopsida (closed-eye squid), Sepiida (cuttlefishes), and Oegopsida (open-eye squid) that is sister to a Sepiolida (bobtail and bottletail squid) clade. Idiosepiida (pygmy squid) is consistently recovered as the sister group to all sampled decapodiform lineages. Further, a weighted Shimodaira-Hasegawa test applied to one of our larger data matrices rejects all alternatives to these ordinal-level relationships. At present, available nuclear genome-scale data support nested clades of relatively large-bodied decapodiform cephalopods to the exclusion of pygmy squids, but improved taxon sampling and additional genomic data will be needed to test these novel hypotheses rigorously.
Collapse
|
107
|
Enhancing protein backbone angle prediction by using simpler models of deep neural networks. Sci Rep 2020; 10:19430. [PMID: 33173130 PMCID: PMC7655839 DOI: 10.1038/s41598-020-76317-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 10/23/2020] [Indexed: 11/09/2022] Open
Abstract
Protein structure prediction is a grand challenge. Prediction of protein structures via the representations using backbone dihedral angles has recently achieved significant progress along with the on-going surge of deep neural network (DNN) research in general. However, we observe that in the protein backbone angle prediction research, there is an overall trend to employ more and more complex neural networks and then to throw more and more features to the neural networks. While more features might add more predictive power to the neural network, we argue that redundant features could rather clutter the scenario and more complex neural networks then just could counterbalance the noise. From artificial intelligence and machine learning perspectives, problem representations and solution approaches do mutually interact and thus affect performance. We also argue that comparatively simpler predictors can more easily be reconstructed than the more complex ones. With these arguments in mind, we present a deep learning method named Simpler Angle Predictor (SAP) to train simpler DNN models that enhance protein backbone angle prediction. We then empirically show that SAP significantly outperforms existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are above 3 in mean absolute error (MAE). The SAP program along with its data is available from the website https://gitlab.com/mahnewton/sap.
Collapse
|
108
|
Urban G, Torrisi M, Magnan CN, Pollastri G, Baldi P. Protein profiles: Biases and protocols. Comput Struct Biotechnol J 2020; 18:2281-2289. [PMID: 32994887 PMCID: PMC7486441 DOI: 10.1016/j.csbj.2020.08.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 08/14/2020] [Accepted: 08/15/2020] [Indexed: 11/13/2022] Open
Abstract
The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu) and can be downloaded at: www.download.igb.uci.edu/#evalpro.
Collapse
Affiliation(s)
- Gregor Urban
- Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Mirko Torrisi
- UCD Institute for Discovery, University College Dublin, Dublin, 4, Ireland
| | - Christophe N Magnan
- Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Gianluca Pollastri
- UCD Institute for Discovery, University College Dublin, Dublin, 4, Ireland
| | - Pierre Baldi
- Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| |
Collapse
|
109
|
de Brevern AG. Impact of protein dynamics on secondary structure prediction. Biochimie 2020; 179:14-22. [PMID: 32946990 DOI: 10.1016/j.biochi.2020.09.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 09/04/2020] [Accepted: 09/10/2020] [Indexed: 02/08/2023]
Abstract
Protein 3D structures support their biological functions. As the number of protein structures is negligible in regards to the number of available protein sequences, prediction methodologies relying only on protein sequences are essential tools. In this field, protein secondary structure prediction (PSSPs) is a mature area, and is considered to have reached a plateau. Nonetheless, proteins are highly dynamical macromolecules, a property that could impact the PSSP methods. Indeed, in a previous study, the stability of local protein conformations was evaluated demonstrating that some regions easily changed to another type of secondary structure. The protein sequences of this dataset were used by PSSPs and their results compared to molecular dynamics to investigate their potential impact on the quality of the secondary structure prediction. Interestingly, a direct link is observed between the quality of the prediction and the stability of the assignment to the secondary structure state. The more stable a local protein conformation is, the better the prediction will be. The secondary structure assignment not taken from the crystallized structures but from the conformations observed during the dynamics slightly increase the quality of the secondary structure prediction. These results show that evaluation of PSSPs can be done differently, but also that the notion of dynamics can be included in development of PSSPs and other approaches such as de novo approaches.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Biologie Intégrée Du Globule Rouge UMR_S1134, Inserm, Université de Paris, Univ. de la Réunion, Univ. des Antilles, F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; IBL, F-75015, Paris, France.
| |
Collapse
|
110
|
Guo Z, Hou J, Cheng J. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 2020; 89:207-217. [PMID: 32893403 DOI: 10.1002/prot.26007] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/07/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein secondary structure (alpha-helix, beta-strand and coil) is a crucial step for protein inter-residue contact prediction and ab initio tertiary structure prediction. In a previous study, we developed a deep belief network-based protein secondary structure method (DNSS1) and successfully advanced the prediction accuracy beyond 80%. In this work, we developed multiple advanced deep learning architectures (DNSS2) to further improve secondary structure prediction. The major improvements over the DNSS1 method include (a) designing and integrating six advanced one-dimensional deep convolutional/recurrent/residual/memory/fractal/inception networks to predict 3-state and 8-state secondary structure, and (b) using more sensitive profile features inferred from Hidden Markov model (HMM) and multiple sequence alignment (MSA). Most of the deep learning architectures are novel for protein secondary structure prediction. DNSS2 was systematically benchmarked on independent test data sets with eight state-of-art tools and consistently ranked as one of the best methods. Particularly, DNSS2 was tested on the protein targets of 2018 CASP13 experiment and achieved the Q3 score of 81.62%, SOV score of 72.19%, and Q8 score of 73.28%. DNSS2 is freely available at: https://github.com/multicom-toolbox/DNSS2.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
111
|
Yazdani Z, Rafiei A, Yazdani M, Valadan R. Design an Efficient Multi-Epitope Peptide Vaccine Candidate Against SARS-CoV-2: An in silico Analysis. Infect Drug Resist 2020; 13:3007-3022. [PMID: 32943888 PMCID: PMC7459237 DOI: 10.2147/idr.s264573] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 07/28/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND To date, no specific vaccine or drug has been proven to be effective against SARS-CoV-2 infection. Therefore, we implemented an immunoinformatic approach to design an efficient multi-epitopes vaccine against SARS-CoV-2. RESULTS The designed-vaccine construct consists of several immunodominant epitopes from structural proteins of spike, nucleocapsid, membrane, and envelope. These peptides promote cellular and humoral immunity and interferon-gamma responses. Also, these epitopes have a high antigenic capacity and are not likely to cause allergies. To enhance the vaccine immunogenicity, we used three potent adjuvants: Flagellin of Salmonella enterica subsp. enterica serovar Dublin, a driven peptide from high mobility group box 1 as HP-91, and human beta-defensin 3 protein. The physicochemical and immunological properties of the vaccine structure were evaluated. The tertiary structure of the vaccine protein was predicted and refined by Phyre2 and Galaxi refine and validated using RAMPAGE and ERRAT. Results of ElliPro showed 246 sresidues from vaccine might be conformational B-cell epitopes. Docking of the vaccine with toll-like receptors (TLR) 3, 5, 8, and angiotensin-converting enzyme 2 approved an appropriate interaction between the vaccine and receptors. Prediction of mRNA secondary structure and in silico cloning demonstrated that the vaccine can be efficiently expressed in Escherichia coli. CONCLUSION Our results demonstrated that the multi-epitope vaccine might be potentially antigenic and induce humoral and cellular immune responses against SARS-CoV-2. This vaccine can interact appropriately with the TLR3, 5, and 8. Also, it has a high-quality structure and suitable characteristics such as high stability and potential for expression in Escherichia coli .
Collapse
Affiliation(s)
- Zahra Yazdani
- Department of Immunology, Molecular and Cell Biology Research Center, School of Medicine, Mazandaran University of Medical Sciences, Sari, Iran
| | - Alireza Rafiei
- Department of Immunology, Molecular and Cell Biology Research Center, School of Medicine, Mazandaran University of Medical Sciences, Sari, Iran
| | - Mohammadreza Yazdani
- Department of Chemistry, Isfahan University of Technology, Isfahan84156-83111, Iran
| | - Reza Valadan
- Department of Immunology, Molecular and Cell Biology Research Center, School of Medicine, Mazandaran University of Medical Sciences, Sari, Iran
| |
Collapse
|
112
|
Azginoglu N, Aydin Z, Celik M. Structural profile matrices for predicting structural properties of proteins. J Bioinform Comput Biol 2020; 18:2050022. [PMID: 32649260 DOI: 10.1142/s0219720020500225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting structural properties of proteins plays a key role in predicting the 3D structure of proteins. In this study, new structural profile matrices (SPM) are developed for protein secondary structure, solvent accessibility and torsion angle class predictions, which could be used as input to 3D prediction algorithms. The structural templates employed in computing SPMs are detected by eight alignment methods in LOMETS server, gap affine alignment method, ScanProsite, PfamScan, and HHblits. The contribution of each template is weighted by its similarity to target, which is assessed by several sequence alignment scores. For comparison, the SPMs are also computed using Homolpro, which uses BLAST for target template alignments and does not assign weights to templates. Incorporating the SPMs into DSPRED classifier, the prediction accuracy improves significantly as demonstrated by cross-validation experiments on two difficult benchmarks. The most accurate predictions are obtained using the SPMs derived by threading methods in LOMETS server. On the other hand, the computational cost of computing these SPMs was the highest.
Collapse
Affiliation(s)
- Nuh Azginoglu
- Department of Computer Engineering, Nevsehir Haci Bektas Veli University, Nevsehir 50300, Turkey
| | - Zafer Aydin
- Department of Computer Engineering, Abdullah Gul University, Kayseri 38080, Turkey
| | - Mete Celik
- Department of Computer Engineering, Erciyes University, Kayseri 38039, Turkey
| |
Collapse
|
113
|
Sun J, Frishman D. DeepHelicon: Accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks. J Struct Biol 2020; 212:107574. [PMID: 32663598 DOI: 10.1016/j.jsb.2020.107574] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 01/16/2023]
Abstract
Accurate prediction of amino acid residue contacts is an important prerequisite for generating high-quality 3D models of transmembrane (TM) proteins. While a large number of compositional, evolutionary, and structural properties of proteins can be used to train contact prediction methods, recent research suggests that coevolution between residues provides the strongest indication of their spatial proximity. We have developed a deep learning approach, DeepHelicon, to predict inter-helical residue contacts in TM proteins by considering only coevolutionary features. DeepHelicon comprises a two-stage supervised learning process by residual neural networks for a gradual refinement of contact maps, followed by variance reduction by an ensemble of models. We present a benchmark study of 12 contact predictors and conclude that DeepHelicon together with the two other state-of-the-art methods DeepMetaPSICOV and Membrain2 outperforms the 10 remaining algorithms on all datasets and at all settings. On a set of 44 TM proteins with an average length of 388 residues DeepHelicon achieves the best performance among all benchmarked methods in predicting the top L/5 and L/2 inter-helical contacts, with the mean precision of 87.42% and 77.84%, respectively. On a set of 57 relatively small TM proteins with an average length of 298 residues DeepHelicon ranks second best after DeepMetaPSICOV. DeepHelicon produces the most accurate predictions for large proteins with more than 10 transmembrane helices. Coevolutionary features alone allow to predict inter-helical residue contacts with an accuracy sufficient for generating acceptable 3D models for up to 30% of proteins using a fully automated modeling method such as CONFOLD2.
Collapse
Affiliation(s)
- Jianfeng Sun
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany.
| |
Collapse
|
114
|
Zhu M, Kuechler ER, Zhang J, Matalon O, Dubreuil B, Hofmann A, Loewen C, Levy ED, Gsponer J, Mayor T. Proteomic analysis reveals the direct recruitment of intrinsically disordered regions to stress granules in S. cerevisiae. J Cell Sci 2020; 133:jcs244657. [PMID: 32503941 DOI: 10.1242/jcs.244657] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/15/2020] [Indexed: 01/21/2023] Open
Abstract
Stress granules (SGs) are stress-induced membraneless condensates that store non-translating mRNA and stalled translation initiation complexes. Although metazoan SGs are dynamic compartments where proteins can rapidly exchange with their surroundings, yeast SGs seem largely static. To gain a better understanding of yeast SGs, we identified proteins that sediment after heat shock using mass spectrometry. Proteins that sediment upon heat shock are biased toward a subset of abundant proteins that are significantly enriched in intrinsically disordered regions (IDRs). Heat-induced SG localization of over 80 proteins were confirmed using microscopy, including 32 proteins not previously known to localize to SGs. We found that several IDRs were sufficient to mediate SG recruitment. Moreover, the dynamic exchange of IDRs can be observed using fluorescence recovery after photobleaching, whereas other components remain immobile. Lastly, we showed that the IDR of the Ubp3 deubiquitinase was critical for yeast SG formation. This work shows that IDRs can be sufficient for SG incorporation, can remain dynamic in vitrified SGs, and can play an important role in cellular compartmentalization upon stress.This article has an associated First Person interview with the first author of the paper.
Collapse
Affiliation(s)
- Mang Zhu
- Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4
| | - Erich R Kuechler
- Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4
| | - Joyce Zhang
- Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4
| | - Or Matalon
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Analise Hofmann
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, BC, Canada, V6T 1Z3
| | - Chris Loewen
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, BC, Canada, V6T 1Z3
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Joerg Gsponer
- Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4
| | - Thibault Mayor
- Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4
| |
Collapse
|
115
|
Liu T, Wang Z. MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials. BMC Bioinformatics 2020; 21:246. [PMID: 32631256 PMCID: PMC7336608 DOI: 10.1186/s12859-020-3383-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 01/22/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
116
|
Bhatnager R, Bhasin M, Arora J, Dang AS. Epitope based peptide vaccine against SARS-COV2: an immune-informatics approach. J Biomol Struct Dyn 2020; 39:5690-5705. [PMID: 32619134 DOI: 10.1080/07391102.2020.1787227] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
World is witnessing exponential growth of SARS-CoV2 and fatal outcomes of COVID 19 has proved its pandemic potential already by claiming more than 3 lakhs deaths globally. If not controlled, this ongoing pandemic can cause irreparable socio-economic and psychological impact worldwide. Therefore a safe and effective vaccine against COVID 19 is exigent. Recent advances in immunoinformatics approaches could potentially decline the attrition rate and accelerate the process of vaccine development in these unprecedented times. In the present study, a multivalent subunit vaccine targeting S2 subunit of the SARS-CoV2 S glycoprotein has been designed using open source, immunoinformatics tools. Designed construct comprises of epitopes capable of inducing T cell, B cell (Linear and discontinuous) and Interferon γ. physiologically, vaccine construct is predicted to be thermostable, antigenic, immunogenic, non allergen and non toxic in nature. According to population coverage analysis, designed multiepitope vaccine covers 99.26% population globally. 3D structure of vaccine construct was designed, validated and refined to obtain high quality structure. Refined structure was docked against Toll like receptors to confirm the interactions between them. Vaccine peptide sequence was reverse transcribed, codon optimized and cloned in pET vector. Our in-silico study suggests that proposed vaccine against fusion domain of virus has the potential to elicit an innate as well as humoral immune response in human and restrict the entry of virus inside the cell. Results of the study offer a framework for in-vivo analysis that may hasten the process of development of therapeutic tools against COVID 19.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Richa Bhatnager
- Centre for Medical Biotechnology, M.D. University, Rohtak, Haryana, India
| | - Maheshwar Bhasin
- Department of Neonatology, Lady Hardinge Medical College and associated hospital, New Delhi, India
| | - Jyoti Arora
- Centre for Medical Biotechnology, M.D. University, Rohtak, Haryana, India
| | - Amita S Dang
- Centre for Medical Biotechnology, M.D. University, Rohtak, Haryana, India
| |
Collapse
|
117
|
Ayub U, Naveed H, Shahzad W. PRRAT_AM—An advanced ant-miner to extract accurate and comprehensible classification rules. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106326] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
118
|
Juan SH, Chen TR, Lo WC. A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy. PLoS One 2020; 15:e0235153. [PMID: 32603341 PMCID: PMC7326220 DOI: 10.1371/journal.pone.0235153] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/09/2020] [Indexed: 01/06/2023] Open
Abstract
The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.edu.tw/UniRefNR.
Collapse
Affiliation(s)
- Sheng-Hung Juan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- The Center for Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
119
|
Shi Q, Chen W, Huang S, Jin F, Dong Y, Wang Y, Xue Z. DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network. Bioinformatics 2020; 35:5128-5136. [PMID: 31197306 DOI: 10.1093/bioinformatics/btz464] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. RESULTS This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units' models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. AVAILABILITY AND IMPLEMENTATION The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Weiya Chen
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Siqi Huang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Fanglin Jin
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yinghao Dong
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yan Wang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhidong Xue
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
120
|
Cao Z, Du W, Li G, Cao H. DEEPSMP: A deep learning model for predicting the ectodomain shedding events of membrane proteins. J Bioinform Comput Biol 2020; 18:2050017. [PMID: 32576054 DOI: 10.1142/s0219720020500171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Membrane proteins play essential roles in modern medicine. In recent studies, some membrane proteins involved in ectodomain shedding events have been reported as the potential drug targets and biomarkers of some serious diseases. However, there are few effective tools for identifying the shedding event of membrane proteins. So, it is necessary to design an effective tool for predicting shedding event of membrane proteins. In this study, we design an end-to-end prediction model using deep neural networks with long short-term memory (LSTM) units and attention mechanism, to predict the ectodomain shedding events of membrane proteins only by sequence information. Firstly, the evolutional profiles are encoded from original sequences of these proteins by Position-Specific Iterated BLAST (PSI-BLAST) on Uniref50 database. Then, the LSTM units which contain memory cells are used to hold information from past inputs to the network and the attention mechanism is applied to detect sorting signals in proteins regardless of their position in the sequence. Finally, a fully connected dense layer and a softmax layer are used to obtain the final prediction results. Additionally, we also try to reduce overfitting of the model by using dropout, L2 regularization, and bagging ensemble learning in the model training process. In order to ensure the fairness of performance comparison, firstly we use cross validation process on training dataset obtained from an existing paper. The average accuracy and area under a receiver operating characteristic curve (AUC) of five-fold cross-validation are 81.19% and 0.835 using our proposed model, compared to 75% and 0.78 by a previously published tool, respectively. To better validate the performance of the proposed model, we also evaluate the performance of the proposed model on independent test dataset. The accuracy, sensitivity, and specificity are 83.14%, 84.08%, and 81.63% using our proposed model, compared to 70.20%, 71.97%, and 67.35% by the existing model. The experimental results validate that the proposed model can be regarded as a general tool for predicting ectodomain shedding events of membrane proteins. The pipeline of the model and prediction results can be accessed at the following URL: http://www.csbg-jlu.info/DeepSMP/.
Collapse
Affiliation(s)
- Zhongbo Cao
- Key Laboratory of Symbolic Computation and Knowledge, Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, P. R. China.,School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, P. R. China
| | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge, Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, P. R. China
| | - Gaoyang Li
- Key Laboratory of Symbolic Computation and Knowledge, Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, P. R. China
| | - Huansheng Cao
- Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
121
|
Karimi M, Wu D, Wang Z, Shen Y. DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 2020; 35:3329-3338. [PMID: 30768156 DOI: 10.1093/bioinformatics/btz111] [Citation(s) in RCA: 250] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 12/26/2018] [Accepted: 02/12/2019] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy and interpretability. RESULTS We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC50 within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug-target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. AVAILABILITY AND IMPLEMENTATION Data and source codes are available at https://github.com/Shen-Lab/DeepAffinity. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, College Station, TX, USA
| | - Di Wu
- Department of Electrical and Computer Engineering, College Station, TX, USA
| | - Zhangyang Wang
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, College Station, TX, USA
| |
Collapse
|
122
|
Du W, Sun Y, Li G, Cao H, Pang R, Li Y. CapsNet-SSP: multilane capsule network for predicting human saliva-secretory proteins. BMC Bioinformatics 2020; 21:237. [PMID: 32517646 PMCID: PMC7285745 DOI: 10.1186/s12859-020-03579-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 06/01/2020] [Indexed: 01/24/2023] Open
Abstract
Background Compared with disease biomarkers in blood and urine, biomarkers in saliva have distinct advantages in clinical tests, as they can be conveniently examined through noninvasive sample collection. Therefore, identifying human saliva-secretory proteins and further detecting protein biomarkers in saliva have significant value in clinical medicine. There are only a few methods for predicting saliva-secretory proteins based on conventional machine learning algorithms, and all are highly dependent on annotated protein features. Unlike conventional machine learning algorithms, deep learning algorithms can automatically learn feature representations from input data and thus hold promise for predicting saliva-secretory proteins. Results We present a novel end-to-end deep learning model based on multilane capsule network (CapsNet) with differently sized convolution kernels to identify saliva-secretory proteins only from sequence information. The proposed model CapsNet-SSP outperforms existing methods based on conventional machine learning algorithms. Furthermore, the model performs better than other state-of-the-art deep learning architectures mostly used to analyze biological sequences. In addition, we further validate the effectiveness of CapsNet-SSP by comparison with human saliva-secretory proteins from existing studies and known salivary protein biomarkers of cancer. Conclusions The main contributions of this study are as follows: (1) an end-to-end model based on CapsNet is proposed to identify saliva-secretory proteins from the sequence information; (2) the proposed model achieves better performance and outperforms existing models; and (3) the saliva-secretory proteins predicted by our model are statistically significant compared with existing cancer biomarkers in saliva. In addition, a web server of CapsNet-SSP is developed for saliva-secretory protein identification, and it can be accessed at the following URL: http://www.csbg-jlu.info/CapsNet-SSP/. We believe that our model and web server will be useful for biomedical researchers who are interested in finding salivary protein biomarkers, especially when they have identified candidate proteins for analyzing diseased tissues near or distal to salivary glands using transcriptome or proteomics.
Collapse
Affiliation(s)
- Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yu Sun
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Gaoyang Li
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Huansheng Cao
- Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA
| | - Ran Pang
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Ying Li
- Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
123
|
Hou J, Adhikari B, Tanner JJ, Cheng J. SAXSDom: Modeling multidomain protein structures using small-angle X-ray scattering data. Proteins 2020; 88:775-787. [PMID: 31860156 PMCID: PMC7230021 DOI: 10.1002/prot.25865] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/18/2019] [Accepted: 12/14/2019] [Indexed: 12/27/2022]
Abstract
Many proteins are composed of several domains that pack together into a complex tertiary structure. Multidomain proteins can be challenging for protein structure modeling, particularly those for which templates can be found for individual domains but not for the entire sequence. In such cases, homology modeling can generate high quality models of the domains but not for the orientations between domains. Small-angle X-ray scattering (SAXS) reports the structural properties of entire proteins and has the potential for guiding homology modeling of multidomain proteins. In this article, we describe a novel multidomain protein assembly modeling method, SAXSDom that integrates experimental knowledge from SAXS with probabilistic Input-Output Hidden Markov model to assemble the structures of individual domains together. Four SAXS-based scoring functions were developed and tested, and the method was evaluated on multidomain proteins from two public datasets. Incorporation of SAXS information improved the accuracy of domain assembly for 40 out of 46 critical assessment of protein structure prediction multidomain protein targets and 45 out of 73 multidomain protein targets from the ab initio domain assembly dataset. The results demonstrate that SAXS data can provide useful information to improve the accuracy of domain-domain assembly. The source code and tool packages are available at https://github.com/jianlin-cheng/SAXSDom.
Collapse
Affiliation(s)
- Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, Saint Louis, MO 63121, USA
| | - John J. Tanner
- Departments of Biochemistry and Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
124
|
Gress A, Kalinina OV. SphereCon-a method for precise estimation of residue relative solvent accessible area from limited structural information. Bioinformatics 2020; 36:3372-3378. [PMID: 32154837 DOI: 10.1093/bioinformatics/btaa159] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 02/28/2020] [Accepted: 03/04/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. RESULTS We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. AVAILABILITY AND IMPLEMENTATION https://github.com/kalininalab/spherecon. CONTACT alexander.gress@helmholtz-hips.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander Gress
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken 66123, Germany.,Graduate School of Computer Science, Saarland University, Saarbrücken 66123, Germany
| | - Olga V Kalinina
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken 66123, Germany.,Medical Faculty, Saarland University, Homburg 66421, Germany
| |
Collapse
|
125
|
Wekesa JS, Meng J, Luan Y. A deep learning model for plant lncRNA-protein interaction prediction with graph attention. Mol Genet Genomics 2020; 295:1091-1102. [DOI: 10.1007/s00438-020-01682-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 05/01/2020] [Indexed: 02/06/2023]
|
126
|
Wekesa JS, Meng J, Luan Y. Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction. Genomics 2020; 112:2928-2936. [PMID: 32437848 DOI: 10.1016/j.ygeno.2020.05.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/22/2020] [Accepted: 05/05/2020] [Indexed: 12/28/2022]
Abstract
Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.
Collapse
Affiliation(s)
- Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China; School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116023, China
| |
Collapse
|
127
|
Shapovalov M, Dunbrack RL, Vucetic S. Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction. PLoS One 2020; 15:e0232528. [PMID: 32374785 PMCID: PMC7202669 DOI: 10.1371/journal.pone.0232528] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 04/16/2020] [Indexed: 11/30/2022] Open
Abstract
Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.
Collapse
Affiliation(s)
- Maxim Shapovalov
- Fox Chase Cancer Center, Philadelphia, PA, United States of America
- Temple University, Philadelphia, PA, United States of America
| | | | | |
Collapse
|
128
|
MOHL JONATHONE, GERKEN THOMAS, LEUNG MINGYING. Predicting mucin-type O-Glycosylation using enhancement value products from derived protein features. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2020; 19:2040003. [PMID: 33208985 PMCID: PMC7671581 DOI: 10.1142/s0219633620400039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Mucin-type O-glycosylation is one of the most common post-translational modifications of proteins. This glycosylation is initiated in the Golgi by the addition of the sugar N-acetylgalactosamine (GalNAc) onto protein Ser and Thr residues by a family of polypeptide GalNAc transferases. In humans there are 20 isoforms that are differentially expressed across tissues that serve multiple important biological roles. Using random peptide substrates, isoform specific amino acid preferences have been obtained in the form of enhancement values (EV). These EVs alone have previously been used to predict O-glycosylation sites via the web based ISOGlyP (Isoform Specific O-Glycosylation Prediction) tool. Here we explore additional protein features to determine whether these can complement the random peptide derived enhancement values and increase the predictive power of ISOGlyP. The inclusion of additional protein substrate features (such as secondary structure and surface accessibility) was found to increase sensitivity with minimal loss of specificity, when tested with three different published in vivo O-glycoproteomics data sets, thus increasing the overall accuracy of the ISOGlyP predictions.
Collapse
Affiliation(s)
- JONATHON E. MOHL
- Department of Mathematical Sciences and Border Biomedical Research
Center, The University of Texas at El Paso, El Paso, TX 79968, USA
| | - THOMAS GERKEN
- Departments of Biochemistry and Chemistry, Case Western Reserve
University, Cleveland, OH, 44106, USA
| | - MING-YING LEUNG
- Department of Mathematical Sciences and Border Biomedical Research
Center, The University of Texas at El Paso, El Paso, TX 79968, USA
| |
Collapse
|
129
|
León Y, Zapata L, Salas-Burgos A, Oñate A. In silico design of a vaccine candidate based on autotransporters and HSP against the causal agent of shigellosis, Shigella flexneri. Mol Immunol 2020; 121:47-58. [DOI: 10.1016/j.molimm.2020.02.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 02/10/2020] [Accepted: 02/12/2020] [Indexed: 12/19/2022]
|
130
|
Pandey A, Braun EL. Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root. BIOLOGY 2020; 9:E64. [PMID: 32231097 PMCID: PMC7235752 DOI: 10.3390/biology9040064] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 03/09/2020] [Accepted: 03/20/2020] [Indexed: 12/23/2022]
Abstract
Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.
Collapse
Affiliation(s)
- Akanksha Pandey
- Department of Biology, University of Florida, Gainesville, FL 32611, USA;
| | - Edward L. Braun
- Department of Biology, University of Florida, Gainesville, FL 32611, USA;
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
131
|
Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017104639] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Over the last few decades, a search for the theory of protein folding has
grown into a full-fledged research field at the intersection of biology, chemistry and informatics.
Despite enormous effort, there are still open questions and challenges, like understanding the rules
by which amino acid sequence determines protein secondary structure.
Objective:
In this review, we depict the progress of the prediction methods over the years and
identify sources of improvement.
Methods:
The protein secondary structure prediction problem is described followed by the discussion
on theoretical limitations, description of the commonly used data sets, features and a review
of three generations of methods with the focus on the most recent advances. Additionally, methods
with available online servers are assessed on the independent data set.
Results:
The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and
76.5% for an 8-class prediction.
Conclusion:
This review summarizes recent advances and outlines further research directions.
Collapse
Affiliation(s)
- Tomasz Smolarczyk
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Irena Roterman-Konieczna
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Krakow, Poland
| | - Katarzyna Stapor
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
132
|
The Order-Disorder Continuum: Linking Predictions of Protein Structure and Disorder through Molecular Simulation. Sci Rep 2020; 10:2068. [PMID: 32034199 PMCID: PMC7005769 DOI: 10.1038/s41598-020-58868-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 10/16/2019] [Indexed: 12/11/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions within proteins (IDRs) serve an increasingly expansive list of biological functions, including regulation of transcription and translation, protein phosphorylation, cellular signal transduction, as well as mechanical roles. The strong link between protein function and disorder motivates a deeper fundamental characterization of IDPs and IDRs for discovering new functions and relevant mechanisms. We review recent advances in experimental techniques that have improved identification of disordered regions in proteins. Yet, experimentally curated disorder information still does not currently scale to the level of experimentally determined structural information in folded protein databases, and disorder predictors rely on several different binary definitions of disorder. To link secondary structure prediction algorithms developed for folded proteins and protein disorder predictors, we conduct molecular dynamics simulations on representative proteins from the Protein Data Bank, comparing secondary structure and disorder predictions with simulation results. We find that structure predictor performance from neural networks can be leveraged for the identification of highly dynamic regions within molecules, linked to disorder. Low accuracy structure predictions suggest a lack of static structure for regions that disorder predictors fail to identify. While disorder databases continue to expand, secondary structure predictors and molecular simulations can improve disorder predictor performance, which aids discovery of novel functions of IDPs and IDRs. These observations provide a platform for the development of new, integrated structural databases and fusion of prediction tools toward protein disorder characterization in health and disease.
Collapse
|
133
|
Bohra N, Sasidharan S, Raj S, Balaji SN, Saudagar P. Utilising capsid proteins of poliovirus to design a multi-epitope based subunit vaccine by immunoinformatics approach. MOLECULAR SIMULATION 2020. [DOI: 10.1080/08927022.2020.1720916] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Nitin Bohra
- Department of Biotechnology, National Institute of Technology, Warangal, Telangana, India
| | - Santanu Sasidharan
- Department of Biotechnology, National Institute of Technology, Warangal, Telangana, India
| | - Shweta Raj
- Department of Biotechnology, National Institute of Technology, Warangal, Telangana, India
| | - S. N. Balaji
- Department of Biotechnology, National Institute of Technology, Warangal, Telangana, India
| | - Prakash Saudagar
- Department of Biotechnology, National Institute of Technology, Warangal, Telangana, India
| |
Collapse
|
134
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 132] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
135
|
Fukuda H, Tomii K. DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment. BMC Bioinformatics 2020; 21:10. [PMID: 31918654 PMCID: PMC6953294 DOI: 10.1186/s12859-019-3190-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/04/2019] [Indexed: 12/30/2022] Open
Abstract
Background Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction. Results We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors. Conclusions The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.
Collapse
Affiliation(s)
- Hiroyuki Fukuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken, 277-8562, Japan
| | - Kentaro Tomii
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken, 277-8562, Japan. .,Artificial Intelligence Research Center (AIRC), Biotechnology Research Institute for Drug Discovery, Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| |
Collapse
|
136
|
An enhanced protein secondary structure prediction using deep learning framework on hybrid profile based features. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105926] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
137
|
Abstract
Modeling the tertiary structure of protein-protein interaction complex has been well studied over many years, especially in the case where the structures of both binding partners are roughly the same before and after binding. However, the assembly of complexes with less-ordered partners is a much harder problem, and modeling even small amounts of flexibility can pose a challenge. In an extreme case, where one of the binding partners is intrinsically disordered before binding, we have previously shown that by initially disregarding the coupling between windows of these intrinsically disordered proteins (IDPs), we can reliably assemble complexes involving IDPs up to at least 69 residues long. Here, we detail the use of the IDP-LZerD package and protocol.
Collapse
|
138
|
The MULTICOM Protein Structure Prediction Server Empowered by Deep Learning and Contact Distance Prediction. Methods Mol Biol 2020; 2165:13-26. [PMID: 32621217 DOI: 10.1007/978-1-0716-0708-4_2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Prediction of the three-dimensional (3D) structure of a protein from its sequence is important for studying its biological function. With the advancement in deep learning contact distance prediction and residue-residue coevolutionary analysis, significant progress has been made in both template-based and template-free protein structure prediction in the last several years. Here, we provide a practical guide for our latest MULTICOM protein structure prediction system built on top of the latest advances, which was rigorously tested in the 2018 CASP13 experiment. Its specific functionalities include: (1) prediction of 1D structural features (secondary structure, solvent accessibility, disordered regions) and 2D interresidue contacts; (2) domain boundary prediction; (3) template-based (or homology) 3D structure modeling; (4) contact distance-driven ab initio 3D structure modeling; and (5) large-scale protein quality assessment enhanced by deep learning and predicted contacts. The MULTICOM web server ( http://sysbio.rnet.missouri.edu/multicom_cluster/ ) presents all the 1D, 2D, and 3D prediction results and quality assessment to users via user-friendly web interfaces and e-mails. The source code of the MULTICOM package is also available at https://github.com/multicom-toolbox/multicom .
Collapse
|
139
|
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019; 22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open
Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining
| | - Weiya Chen
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization
| | - Siqi Huang
- Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining
| | - Yan Wang
- School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing
| |
Collapse
|
140
|
Pal A, Saha BK, Saha J. Comparative in silico analysis of ftsZ gene from different bacteria reveals the preference for core set of codons in coding sequence structuring and secondary structural elements determination. PLoS One 2019; 14:e0219231. [PMID: 31841523 PMCID: PMC6913975 DOI: 10.1371/journal.pone.0219231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 11/28/2019] [Indexed: 11/19/2022] Open
Abstract
The deluge of sequence information in the recent times provide us with an excellent opportunity to compare organisms on a large genomic scale. In this study we have tried to decipher the variation in the gene organization and structuring of a vital bacterial gene called ftsZ which codes for an integral component of the bacterial cell division, the FtsZ protein. FtsZ is homologous to tubulin protein and has been found to be ubiquitous in eubacteria. FtsZ is showing increasing promise as a target for antibacterial drug discovery. Our study of ftsZ protein from 143 different bacterial species spanning a wider range of morphological and physiological type demonstrates that the ftsZ gene of about ninety three percent of the organisms show relatively biased codon usage profile and significant GC deviation from their genomic GC content. Comparative codon usage analysis of ftsZ and a core housekeeping gene rpoB demonstrated that codon usage pattern of ftsZ CDS is shaped by natural selection to a large extent and mimics that of a housekeeping gene. We have also detected a tendency among the different organisms to utilize a core set of codons in structuring the ftsZ coding sequence. We observed that the compositional frequency of the amino acid serine in the FtsZ protein appears to be a indicator of the bacterial lifestyle. Our meticulous analysis of the ftsZ gene linked with the corresponding FtsZ protein show that there is a bias towards the use of specific synonymous codons particularly in the helix and strand regions of the multi-domain FtsZ protein. Overall our findings suggest that in an indispensable and vital protein such as FtsZ, there is an inherent tendency to maintain form for optimized performance in spite of the extrinsic variability in coding features.
Collapse
Affiliation(s)
- Ayon Pal
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, India
| | - Barnan Kumar Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, India
| | - Jayanti Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, India
| |
Collapse
|
141
|
Raimondi D, Orlando G, Vranken WF, Moreau Y. Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci Rep 2019; 9:16932. [PMID: 31729443 PMCID: PMC6858301 DOI: 10.1038/s41598-019-53324-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 10/25/2019] [Indexed: 11/21/2022] Open
Abstract
Machine learning (ML) is ubiquitous in bioinformatics, due to its versatility. One of the most crucial aspects to consider while training a ML model is to carefully select the optimal feature encoding for the problem at hand. Biophysical propensity scales are widely adopted in structural bioinformatics because they describe amino acids properties that are intuitively relevant for many structural and functional aspects of proteins, and are thus commonly used as input features for ML methods. In this paper we reproduce three classical structural bioinformatics prediction tasks to investigate the main assumptions about the use of propensity scales as input features for ML methods. We investigate their usefulness with different randomization experiments and we show that their effectiveness varies among the ML methods used and the tasks. We show that while linear methods are more dependent on the feature encoding, the specific biophysical meaning of the features is less relevant for non-linear methods. Moreover, we show that even among linear ML methods, the simpler one-hot encoding can surprisingly outperform the “biologically meaningful” scales. We also show that feature selection performed with non-linear ML methods may not be able to distinguish between randomized and “real” propensity scales by properly prioritizing to the latter. Finally, we show that learning problem-specific embeddings could be a simple, assumptions-free and optimal way to perform feature learning/engineering for structural bioinformatics tasks.
Collapse
Affiliation(s)
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050, Brussels, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, 3001, Leuven, Belgium.
| |
Collapse
|
142
|
Hong J, Luo Y, Mou M, Fu J, Zhang Y, Xue W, Xie T, Tao L, Lou Y, Zhu F. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform 2019; 21:1825-1836. [PMID: 31860715 DOI: 10.1093/bib/bbz120] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 08/12/2019] [Accepted: 08/21/2019] [Indexed: 12/20/2022] Open
Abstract
The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.
Collapse
Affiliation(s)
- Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
143
|
Kaul T, Eswaran M, Ahmad S, Thangaraj A, Jain R, Kaul R, Raman NM, Bharti J. Probing the effect of a plus 1bp frameshift mutation in protein-DNA interface of domestication gene, NAMB1, in wheat. J Biomol Struct Dyn 2019; 38:3633-3647. [PMID: 31621500 DOI: 10.1080/07391102.2019.1680435] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Transcription factor NAM-B1 has a major role in the process of senescence, which results in higher Fe and Zn concentrations in grains of wild wheat (T. durum; Td). The absence of the wild type NAMB1 in T. aestivum (Ta), one of the cardinal crops essential for more than 1/3rd of the global population, affects Fe and Zn remobilisation to the maturing grain from the flag leaf resulting in lesser micronutrient bioavailability. The cardinal difference in the NAMB1 gene between the two species is the absence of +1 bp allele in Ta. Insilico studies using NAMB1 from Td and Ta was performed to explore the variation in the interaction with the conserved cis-element DNA motif (CATGTG) as both the proteins share the same domain, but there are no in silico studies reported of these proteins. The secondary structure, 3D-modelling of the proteins, DNA-protein docking and dynamics have computed by Schrodinger Prime Suite. Predicted secondary structures were energy minimised using Macromodel and docking was performed based on binding energy and hydrogen bonds. Molecular dynamics simulation of NAMB1-Ta and NAMB1-Td individually and with the cis-element motif, performed for 100 ns, revealed significant variations in the protein-DNA interaction in Ta. This work provides the modelled 3D-interaction profile caused by a single bp frameshift mutation in understanding the difference in function between NAMB1 orthologs due to lack of NAC domain. The overall computational analysis reveals that NAMB1-Ta and NAMB1-Td proteins display a good amount of dissimilarity in their structure, dynamics and DNA-binding characteristics.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tanushri Kaul
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Murugesh Eswaran
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Shaban Ahmad
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Arulprakash Thangaraj
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Rashmi Jain
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Rashmi Kaul
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Nitya Meenakshi Raman
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Jyotsna Bharti
- Nutritional Improvement of Crops Group, Plant Molecular Biology Division, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| |
Collapse
|
144
|
Wozniak PP, Pelc J, Skrzypecki M, Vriend G, Kotulska M. Bio-knowledge-based filters improve residue-residue contact prediction accuracy. Bioinformatics 2019; 34:3675-3683. [PMID: 29850768 DOI: 10.1093/bioinformatics/bty416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 05/19/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Residue-residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. Results We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut-off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. Availability and implementation All data and scripts are available at http://comprec-lin.iiar.pwr.edu.pl/FPfilter/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Pelc
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - M Skrzypecki
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
145
|
Dhal AK, Pani A, Mahapatra RK, Yun SI. An immunoinformatics approach for design and validation of multi-subunit vaccine against Cryptosporidium parvum. Immunobiology 2019; 224:747-757. [PMID: 31522782 DOI: 10.1016/j.imbio.2019.09.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 08/29/2019] [Accepted: 09/03/2019] [Indexed: 12/30/2022]
Abstract
An immunoinformatics-based approach is explored for potential multi-subunit vaccine candidates against Cryptosporidium parvum. We performed protein structure based systematic methodology for the development of a proficient multi-subunit vaccine candidate against C. parvum based on their probability of antigenicity, allergenicity and transmembrane helices as the screening criteria. The best-screened epitopes like B-cell epitopes (BCL), Helper T-lymphocytes (HTL) and cytotoxic T- lymphocytes (CTL) were joined by using the appropriate linkers to intensify and develop the presentation and processing of the antigenic molecules. Modeller software was used to generate the best 3D model of the subunit protein. RAMPAGE and other web servers were employed for the validation of the modeled protein. Furthermore, the predicted modeled structure was docked with the two known receptors like TLR2 and TLR4 through ClusPro web server. Based on the docking score, the multi-subunit vaccine docked with TLR2 was subjected to energy minimization by molecular dynamics (MD) simulation to examine their stability within a solvent system. From the simulation study, we found that the residue Glu-107 of subunit vaccine formed a hydrogen bond interaction with Arg-299 of the TLR2 receptor throughout the time frame of the MD simulation. The overall results showed that the multi-subunit vaccine could be an efficient vaccine candidate against C. parvum.
Collapse
Affiliation(s)
- Ajit Kumar Dhal
- School of Biotechnology, KIIT Deemed to be University, Bhubaneswar 751024, Odisha, India
| | - Alok Pani
- Department of Food Science and Technology, Chonbuk National University, Jeonju, 561756, South Korea
| | - Rajani Kanta Mahapatra
- School of Biotechnology, KIIT Deemed to be University, Bhubaneswar 751024, Odisha, India.
| | - Soon-Il Yun
- Department of Food Science and Technology, Chonbuk National University, Jeonju, 561756, South Korea.
| |
Collapse
|
146
|
Abelin JG, Harjanto D, Malloy M, Suri P, Colson T, Goulding SP, Creech AL, Serrano LR, Nasir G, Nasrullah Y, McGann CD, Velez D, Ting YS, Poran A, Rothenberg DA, Chhangawala S, Rubinsteyn A, Hammerbacher J, Gaynor RB, Fritsch EF, Greshock J, Oslund RC, Barthelme D, Addona TA, Arieta CM, Rooney MS. Defining HLA-II Ligand Processing and Binding Rules with Mass Spectrometry Enhances Cancer Epitope Prediction. Immunity 2019; 51:766-779.e17. [PMID: 31495665 DOI: 10.1016/j.immuni.2019.08.012] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 06/19/2019] [Accepted: 08/15/2019] [Indexed: 12/30/2022]
Abstract
Increasing evidence indicates CD4+ T cells can recognize cancer-specific antigens and control tumor growth. However, it remains difficult to predict the antigens that will be presented by human leukocyte antigen class II molecules (HLA-II), hindering efforts to optimally target them therapeutically. Obstacles include inaccurate peptide-binding prediction and unsolved complexities of the HLA-II pathway. To address these challenges, we developed an improved technology for discovering HLA-II binding motifs and conducted a comprehensive analysis of tumor ligandomes to learn processing rules relevant in the tumor microenvironment. We profiled >40 HLA-II alleles and showed that binding motifs were highly sensitive to HLA-DM, a peptide-loading chaperone. We also revealed that intratumoral HLA-II presentation was dominated by professional antigen-presenting cells (APCs) rather than cancer cells. Integrating these observations, we developed algorithms that accurately predicted APC ligandomes, including peptides from phagocytosed cancer cells. These tools and biological insights will enable improved HLA-II-directed cancer therapies.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Asaf Poran
- Neon Therapeutics, Cambridge, MA 02139, USA
| | | | | | - Alex Rubinsteyn
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jeff Hammerbacher
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
147
|
Bahrami AA, Payandeh Z, Khalili S, Zakeri A, Bandehpour M. Immunoinformatics: In Silico Approaches and Computational Design of a Multi-epitope, Immunogenic Protein. Int Rev Immunol 2019; 38:307-322. [PMID: 31478759 DOI: 10.1080/08830185.2019.1657426] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Immunoinformatics is a new critical field with several tools and databases that conduct the eyesight of experimental selection and facilitate analysis of the great amount of immunologic data obtained from experimental researches and helps to design and introducing new hypothesis. Given these visages, immunoinformatics seems to be the way that develop and progress the immunological research. Bioinformatics methods and applications are successfully employed in vaccine informatics to assist different sites of the preclinical, clinical, and post-licensure vaccine enterprises. On the other hand, the progression of molecular biology and immunology caused epitope vaccines have become the focus of research on molecular vaccines. Moreover, reverse vaccinology could improve vaccine production and vaccination protocols by in silico prediction of protein-vaccine candidates from genome sequences. B- and T-cell immune epitopes could be predicted by immunoinformatics algorithms and computational methods to improve the vaccine design, protective immunity analysis, assessment of vaccine safety and efficacy, and immunization modeling. This review aims to discuss the power of computational approaches in vaccine design and their relevance to the development of effective vaccines. Furthermore, the various divisions of this field and available tools in each item are introduced and reviewed.
Collapse
Affiliation(s)
- Armina Alagheband Bahrami
- Department of Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Zahra Payandeh
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Saeed Khalili
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Alireza Zakeri
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Mojgan Bandehpour
- Department of Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
148
|
Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2019; 34:2605-2613. [PMID: 29554211 DOI: 10.1093/bioinformatics/bty166] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 03/13/2018] [Indexed: 01/09/2023] Open
Abstract
Motivation Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sameer Khurana
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Reda Rawi
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD, USA
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Gwo-Yu Chuang
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD, USA
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
149
|
Torrisi M, Kaleel M, Pollastri G. Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction. Sci Rep 2019; 9:12374. [PMID: 31451723 PMCID: PMC6710256 DOI: 10.1038/s41598-019-48786-x] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 08/12/2019] [Indexed: 01/10/2023] Open
Abstract
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88–90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Manaz Kaleel
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
150
|
Eng CH, Backman TWH, Bailey CB, Magnan C, García Martín H, Katz L, Baldi P, Keasling JD. ClusterCAD: a computational platform for type I modular polyketide synthase design. Nucleic Acids Res 2019; 46:D509-D515. [PMID: 29040649 PMCID: PMC5753242 DOI: 10.1093/nar/gkx893] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 09/24/2017] [Indexed: 01/10/2023] Open
Abstract
ClusterCAD is a web-based toolkit designed to leverage the collinear structure and deterministic logic of type I modular polyketide synthases (PKSs) for synthetic biology applications. The unique organization of these megasynthases, combined with the diversity of their catalytic domain building blocks, has fueled an interest in harnessing the biosynthetic potential of PKSs for the microbial production of both novel natural product analogs and industrially relevant small molecules. However, a limited theoretical understanding of the determinants of PKS fold and function poses a substantial barrier to the design of active variants, and identifying strategies to reliably construct functional PKS chimeras remains an active area of research. In this work, we formalize a paradigm for the design of PKS chimeras and introduce ClusterCAD as a computational platform to streamline and simplify the process of designing experiments to test strategies for engineering PKS variants. ClusterCAD provides chemical structures with stereochemistry for the intermediates generated by each PKS module, as well as sequence- and structure-based search tools that allow users to identify modules based either on amino acid sequence or on the chemical structure of the cognate polyketide intermediate. ClusterCAD can be accessed at https://clustercad.jbei.org and at http://clustercad.igb.uci.edu.
Collapse
Affiliation(s)
- Clara H Eng
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA
| | - Tyler W H Backman
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Energy Agile BioFoundry, Emeryville, CA 94608, USA
| | - Constance B Bailey
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Christophe Magnan
- Department of Computer Science, University of California, Irvine, CA 92697, USA.,Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Héctor García Martín
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Energy Agile BioFoundry, Emeryville, CA 94608, USA
| | - Leonard Katz
- QB3 Institute, University of California, Berkeley, CA 94720, USA
| | - Pierre Baldi
- Department of Computer Science, University of California, Irvine, CA 92697, USA.,Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Jay D Keasling
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA.,Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Energy Agile BioFoundry, Emeryville, CA 94608, USA.,QB3 Institute, University of California, Berkeley, CA 94720, USA.,Department of Bioengineering, University of California, Berkeley, CA 94720, USA.,Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK2970 Horsholm, Denmark
| |
Collapse
|