51
|
Liu X, Xu LL, Lu YP, Yang T, Gu XY, Wang L, Liu Y. Deep_KsuccSite: A novel deep learning method for the identification of lysine succinylation sites. Front Genet 2022; 13:1007618. [PMID: 36246655 PMCID: PMC9557156 DOI: 10.3389/fgene.2022.1007618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
Identification of lysine (symbol Lys or K) succinylation (Ksucc) sites centralizes the basis for disclosing the mechanism and function of lysine succinylation modifications. Traditional experimental methods for Ksucc site ientification are often costly and time-consuming. Therefore, it is necessary to construct an efficient computational method to prediction the presence of Ksucc sites in protein sequences. In this study, we proposed a novel and effective predictor for the identification of Ksucc sites based on deep learning algorithms that was termed as Deep_KsuccSite. The predictor adopted Composition, Transition, and Distribution (CTD) Composition (CTDC), Enhanced Grouped Amino Acid Composition (EGAAC), Amphiphilic Pseudo-Amino Acid Composition (APAAC), and Embedding Encoding methods to encode peptides, then constructed three base classifiers using one-dimensional (1D) convolutional neural network (CNN) and 2D-CNN, and finally utilized voting method to get the final results. K-fold cross-validation and independent testing showed that Deep_KsuccSite could serve as an effective tool to identify Ksucc sites in protein sequences. In addition, the ablation experiment results based on voting, feature combination, and model architecture showed that Deep_KsuccSite could make full use of the information of different features to construct an effective classifier. Taken together, we developed Deep_KsuccSite in this study, which was based on deep learning algorithm and could achieved better prediction accuracy than current methods for lysine succinylation sites. The code and dataset involved in this methodological study are permanently available at the URL https://github.com/flyinsky6/Deep_KsuccSite.
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Lin-Lin Xu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Ya-Ping Lu
- College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Ting Yang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Xin-Yu Gu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Liang Wang
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Yong Liu
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| |
Collapse
|
52
|
Pu Y, Li J, Tang J, Guo F. DeepFusionDTA: Drug-Target Binding Affinity Prediction With Information Fusion and Hybrid Deep-Learning Ensemble Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2760-2769. [PMID: 34379594 DOI: 10.1109/tcbb.2021.3103966] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identification of drug-target interaction (DTI) is the most important issue in the broad field of drug discovery. Using purely biological experiments to verify drug-target binding profiles takes lots of time and effort, so computational technologies for this task obviously have great benefits in reducing the drug search space. Most of computational methods to predict DTI are proposed to solve a binary classification problem, which ignore the influence of binding strength. Therefore, drug-target binding affinity prediction is still a challenging issue. Currently, lots of studies only extract sequence information that lacks feature-rich representation, but we consider more spatial features in order to merge various data in drug and target spaces. In this study, we propose a two-stage deep neural network ensemble model for detecting drug-target binding affinity, called DeepFusionDTA, via various information analysis modules. First stage is to utilize sequence and structure information to generate fusion feature map of candidate protein and drug pair through various analysis modules based deep learning. Second stage is to apply bagging-based ensemble learning strategy for regression prediction, and we obtain outstanding results by combining the advantages of various algorithms in efficient feature abstraction and regression calculation. Importantly, we evaluate our novel method, DeepFusionDTA, which delivers 1.5 percent CI increase on KIBA dataset and 1.0 percent increase on Davis dataset, by comparing with existing prediction tools, DeepDTA. Furthermore, the ideas we have offered can be applied to in-silico screening of the interaction space, to provide novel DTIs which can be experimentally pursued. The codes and data are available from https://github.com/guofei-tju/DeepFusionDTA.
Collapse
|
53
|
Antony JV, Koya R, Pournami PN, Nair GG, Balakrishnan JP. Protein secondary structure assignment using residual networks. J Mol Model 2022; 28:269. [PMID: 35997827 DOI: 10.1007/s00894-022-05271-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 08/12/2022] [Indexed: 11/27/2022]
Abstract
Proteins are constructed from amino acid sequences. Their structural classifications include primary, secondary, tertiary, and quaternary, with tertiary and quaternary structures influencing protein function. Because a protein's structure is inextricably connected to its biological function, machine learning algorithms that can better anticipate the structures have the potential to lead to new scientific discoveries in human health and improve our capacity to develop new treatments. Protein secondary structure assignment enriches the structural and functional understanding of proteins. It helps in protein structure comparison and classification studies, besides facilitating secondary and tertiary structure prediction systems. Several secondary structure assignment methods have been developed since the 1980s, most of which are based on hydrogen bond analysis and atomic coordinate features. However, the assignment process becomes complex when protein data includes missing atoms. Deep neural networks are often referred to as universal function approximators because they can approximate any function to produce the desired output when properly designed and trained. Optimised deep learning architectures have already proven their ability to increase performance in a wide range of problems. Recently, the ResNet architecture has garnered significant interest due to its applicability in various areas, including image classification and protein contact map prediction. The proposed model, which is based on the ResNet architecture, assigns secondary structures using Cα atom coordinates. The model achieved an accuracy of 94% when evaluated against the benchmark and independent test sets. The findings encourage the development of new deep learning-based methods that are more generalised across various protein learning tasks. Furthermore, it allows computational biologists to delve deeper into integrating these techniques with experimental methods. The model codes are available at: https://github.com/jisnava/ResNet_for_Structure_Assignments/ .
Collapse
Affiliation(s)
- Jisna Vellara Antony
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kattangal, Kerala, 673601, India.
| | - Roosafeed Koya
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kattangal, Kerala, 673601, India
| | | | - Gopakumar Gopalakrishnan Nair
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kattangal, Kerala, 673601, India
| | | |
Collapse
|
54
|
DeepRHD: An efficient Hybrid feature Extraction technique for protein remote homology detection using Deep learning strategies. Comput Biol Chem 2022; 100:107749. [DOI: 10.1016/j.compbiolchem.2022.107749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 07/28/2022] [Accepted: 07/30/2022] [Indexed: 11/19/2022]
|
55
|
Multi-task learning to leverage partially annotated data for PPI interface prediction. Sci Rep 2022; 12:10487. [PMID: 35729253 PMCID: PMC9213449 DOI: 10.1038/s41598-022-13951-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/29/2022] Open
Abstract
Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations—with data extension—reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein’s functional properties of interest are only partially annotated.
Collapse
|
56
|
Bokor M, Házy E, Tantos Á. Wide-Line NMR Melting Diagrams, Their Thermodynamic Interpretation, and Secondary Structure Predictions for A30P and E46K α-Synuclein. ACS OMEGA 2022; 7:18323-18330. [PMID: 35694516 PMCID: PMC9178613 DOI: 10.1021/acsomega.2c00477] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/06/2022] [Indexed: 06/15/2023]
Abstract
Parkinson's disease is thought to be caused by aggregation of the intrinsically disordered protein, α-synuclein. Two amyloidogenic variants, A30P, and E46K familial mutants were investigated by wide-line 1H NMR spectrometry as a completion of our earlier work on wild-type and A53T α-synuclein (Bokor M. et al. WT and A53T α-synuclein systems: melting diagram and its new interpretation. Int. J. Mol. Sci.2020, 21, 3997.). A monolayer of mobile water molecules hydrates A30P α-synuclein at the lowest potential barriers (temperatures), while E46K α-synuclein has here third as much mobile hydration, insufficient for functionality. According to wide-line 1H NMR results and secondary structure predictions, E46K α-synuclein is more compact than the A30P variant and they are more compact than the wild type (WT) and A53T variants. Linear hydration vs potential barrier sections of A30P α-synuclein shows one and E46K shows two slopes. The different slopes of the latter between potential barriers E a,1 and E a,2 reflect a change in water-protein interactions. The 31-32% homogeneous potential barrier distribution of the protein-water bonds refers to a non-negligible amount of secondary structures in all four α-synuclein variants. The secondary structures detected by wide-line 1H NMR are solvent-exposed α-helices, which are predicted by secondary structure models. β-sheets are only minor components of the protein structures as three- and eight-state predicted secondary structures are dominated by α-helices and coils.
Collapse
Affiliation(s)
- Mónika Bokor
- Institute
for Solid State Physics and Optics, Wigner
Research Centre for Physics, 1121 Budapest, Hungary
| | - Eszter Házy
- Institute
of Enzymology, Research Centre for Natural
Sciences, 1117 Budapest, Hungary
| | - Ágnes Tantos
- Institute
of Enzymology, Research Centre for Natural
Sciences, 1117 Budapest, Hungary
| |
Collapse
|
57
|
Guo Y, Wu J, Ma H, Wang S, Huang J. Deep Ensemble Learning with Atrous Spatial Pyramid Networks for Protein Secondary Structure Prediction. Biomolecules 2022; 12:biom12060774. [PMID: 35740899 PMCID: PMC9221033 DOI: 10.3390/biom12060774] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/26/2022] [Accepted: 05/30/2022] [Indexed: 02/04/2023] Open
Abstract
The secondary structure of proteins is significant for studying the three-dimensional structure and functions of proteins. Several models from image understanding and natural language modeling have been successfully adapted in the protein sequence study area, such as Long Short-term Memory (LSTM) network and Convolutional Neural Network (CNN). Recently, Gated Convolutional Neural Network (GCNN) has been proposed for natural language processing. It has achieved high levels of sentence scoring, as well as reduced the latency. Conditionally Parameterized Convolution (CondConv) is another novel study which has gained great success in the image processing area. Compared with vanilla CNN, CondConv uses extra sample-dependant modules to conditionally adjust the convolutional network. In this paper, we propose a novel Conditionally Parameterized Convolutional network (CondGCNN) which utilizes the power of both CondConv and GCNN. CondGCNN leverages an ensemble encoder to combine the capabilities of both LSTM and CondGCNN to encode protein sequences by better capturing protein sequential features. In addition, we explore the similarity between the secondary structure prediction problem and the image segmentation problem, and propose an ASP network (Atrous Spatial Pyramid Pooling (ASPP) based network) to capture fine boundary details in secondary structure. Extensive experiments show that the proposed method can achieve higher performance on protein secondary structure prediction task than existing methods on CB513, Casp11, CASP12, CASP13, and CASP14 datasets. We also conducted ablation studies over each component to verify the effectiveness. Our method is expected to be useful for any protein related prediction tasks, which is not limited to protein secondary structure prediction.
Collapse
Affiliation(s)
- Yuzhi Guo
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
| | | | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
| | - Sheng Wang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA; (Y.G.); (H.M.); (S.W.)
- Correspondence:
| |
Collapse
|
58
|
Zhang X, Liu Y, Wang Y, Zhang L, Feng L, Jin B, Zhang H. Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction. Front Genet 2022; 13:769828. [PMID: 35677562 PMCID: PMC9170271 DOI: 10.3389/fgene.2022.769828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 01/25/2022] [Indexed: 11/13/2022] Open
Abstract
In the field of bioinformatics, understanding protein secondary structure is very important for exploring diseases and finding new treatments. Considering that the physical experiment-based protein secondary structure prediction methods are time-consuming and expensive, some pattern recognition and machine learning methods are proposed. However, most of the methods achieve quite similar performance, which seems to reach a model capacity bottleneck. As both model design and learning process can affect the model learning capacity, we pay attention to the latter part. To this end, a framework called Multistage Combination Classifier Augmented Model (MCCM) is proposed to solve the protein secondary structure prediction task. Specifically, first, a feature extraction module is introduced to extract features with different levels of learning difficulties. Second, multistage combination classifiers are proposed to learn decision boundaries for easy and hard samples, respectively, with the latter penalizing the loss value of the hard samples and finally improving the prediction performance of hard samples. Third, based on the Dirichlet distribution and information entropy measurement, a sample difficulty discrimination module is designed to assign samples with different learning difficulty levels to the aforementioned classifiers. The experimental results on the publicly available benchmark CB513 dataset show that our method outperforms most state-of-the-art models.
Collapse
Affiliation(s)
- Xu Zhang
- College of Mechanical Engineering, Dalian University of Technology, Dalian, China
| | - Yiwei Liu
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China
| | - Yaming Wang
- The First Affiliated Hospital, Dalian Medical University, Dalian, China
| | - Liang Zhang
- International Business School, Dongbei University of Finance and Economics, Dalian, China
| | - Lin Feng
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China
| | - Bo Jin
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China
- *Correspondence: Bo Jin,
| | - Hongzhe Zhang
- College of Mechanical Engineering, Dalian University of Technology, Dalian, China
| |
Collapse
|
59
|
DeepMHADTA: Prediction of Drug-Target Binding Affinity Using Multi-Head Self-Attention and Convolutional Neural Network. Curr Issues Mol Biol 2022; 44:2287-2299. [PMID: 35678684 PMCID: PMC9164023 DOI: 10.3390/cimb44050155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 05/08/2022] [Accepted: 05/14/2022] [Indexed: 11/17/2022] Open
Abstract
Drug-target interactions provide insight into the drug-side effects and drug repositioning. However, wet-lab biochemical experiments are time-consuming and labor-intensive, and are insufficient to meet the pressing demand for drug research and development. With the rapid advancement of deep learning, computational methods are increasingly applied to screen drug-target interactions. Many methods consider this problem as a binary classification task (binding or not), but ignore the quantitative binding affinity. In this paper, we propose a new end-to-end deep learning method called DeepMHADTA, which uses the multi-head self-attention mechanism in a deep residual network to predict drug-target binding affinity. On two benchmark datasets, our method outperformed several current state-of-the-art methods in terms of multiple performance measures, including mean square error (MSE), consistency index (CI), rm2, and PR curve area (AUPR). The results demonstrated that our method achieved better performance in predicting the drug–target binding affinity.
Collapse
|
60
|
Structural Insights into the Intrinsically Disordered GPCR C-Terminal Region, Major Actor in Arrestin-GPCR Interaction. Biomolecules 2022; 12:biom12050617. [PMID: 35625550 PMCID: PMC9138321 DOI: 10.3390/biom12050617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/12/2022] [Accepted: 04/19/2022] [Indexed: 02/04/2023] Open
Abstract
Arrestin-dependent pathways are a central component of G protein-coupled receptor (GPCRs) signaling. However, the molecular processes regulating arrestin binding are to be further illuminated, in particular with regard to the structural impact of GPCR C-terminal disordered regions. Here, we used an integrated biophysical strategy to describe the basal conformations of the C-terminal domains of three class A GPCRs, the vasopressin V2 receptor (V2R), the growth hormone secretagogue or ghrelin receptor type 1a (GHSR) and the β2-adernergic receptor (β2AR). By doing so, we revealed the presence of transient secondary structures in these regions that are potentially involved in the interaction with arrestin. These secondary structure elements differ from those described in the literature in interaction with arrestin. This suggests a mechanism where the secondary structure conformational preferences in the C-terminal regions of GPCRs could be a central feature for optimizing arrestins recognition.
Collapse
|
61
|
Erath J, Djuranovic S. Association of the receptor for activated C-kinase 1 with ribosomes in Plasmodium falciparum. J Biol Chem 2022; 298:101954. [PMID: 35452681 PMCID: PMC9120242 DOI: 10.1016/j.jbc.2022.101954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 03/31/2022] [Accepted: 04/13/2022] [Indexed: 11/18/2022] Open
Abstract
The receptor for activated C-kinase 1 (RACK1), a highly conserved eukaryotic protein, is known to have many varying biological roles and functions. Previous work has established RACK1 as a ribosomal protein, with defined regions important for ribosome binding in eukaryotic cells. In Plasmodium falciparum, RACK1 has been shown to be required for parasite growth, however, conflicting evidence has been presented about RACK1 ribosome binding and its role in mRNA translation. Given the importance of RACK1 as a regulatory component of mRNA translation and ribosome quality control, the case could be made in parasites that RACK1 either binds or does not bind the ribosome. Here, we used bioinformatics and transcription analyses to further characterize the P. falciparum RACK1 protein. Based on homology modeling and structural analyses, we generated a model of P. falciparum RACK1. We then explored mutant and chimeric human and P. falciparum RACK1 protein binding properties to the human and P. falciparum ribosome. We found that WT, chimeric, and mutant RACK1 exhibit distinct ribosome interactions suggesting different binding characteristics for P. falciparum and human RACK1 proteins. The ribosomal binding of RACK1 variants in human and parasite cells shown here demonstrates that although RACK1 proteins have highly conserved sequences and structures across species, ribosomal binding is affected by species-specific alterations to this protein. In conclusion, we show that in the case of P. falciparum, contrary to the structural data, RACK1 is found to bind ribosomes and actively translating polysomes in parasite cells.
Collapse
Affiliation(s)
- Jessey Erath
- Department of Cell Biology and Physiology, Washington University School of Medicine, St Louis, Missouri, USA
| | - Sergej Djuranovic
- Department of Cell Biology and Physiology, Washington University School of Medicine, St Louis, Missouri, USA.
| |
Collapse
|
62
|
Yang W, Liu Y, Xiao C. Deep metric learning for accurate protein secondary structure prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
63
|
Feng SH, Xia CQ, Zhang PD, Shen HB. Ab-Initio Membrane Protein Amphipathic Helix Structure Prediction Using Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:795-805. [PMID: 33026978 DOI: 10.1109/tcbb.2020.3029274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Amphipathic helix (AH)features the segregation of polar and nonpolar residues and plays important roles in many membrane-associated biological processes through interacting with both the lipid and the soluble phases. Although the AH structure has been discovered for a long time, few ab initio machine learning-based prediction models have been reported, due to the limited amount of training data. In this study, we report a new deep learning-based prediction model, which is composed of a residual neural network and the uneven-thresholds decision algorithm. It is constructed on 121 membrane proteins, in total 51640 residue samples, which are curated from an up-to-date membrane protein structure database. Through a rigid 10-fold nested cross-validation experiment, we demonstrate that our model can achieve promising predictions and exceed current state-of-the-art approaches in this field. This presents a new avenue for accurately predicting AHs. Analysis on the contribution of the input residues and some cases further reveals the high interpretability and the generalization of our model.
Collapse
|
64
|
Alencar WLM, da Silva Arouche T, Neto AFG, de Castro Ramalho T, de Carvalho Júnior RN, de Jesus Chaves Neto AM. Interactions of Co, Cu, and non-metal phthalocyanines with external structures of SARS-CoV-2 using docking and molecular dynamics. Sci Rep 2022; 12:3316. [PMID: 35228662 PMCID: PMC8885651 DOI: 10.1038/s41598-022-07396-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 02/10/2022] [Indexed: 02/06/2023] Open
Abstract
The new coronavirus, SARS-CoV-2, caused the COVID-19 pandemic, characterized by its high rate of contamination, propagation capacity, and lethality rate. In this work, we approach the use of phthalocyanines as an inhibitor of SARS-CoV-2, as they present several interactive properties of the phthalocyanines (Pc) of Cobalt (CoPc), Copper (CuPc) and without a metal group (NoPc) can interact with SARS-CoV-2, showing potential be used as filtering by adsorption on paints on walls, masks, clothes, and air conditioning filters. Molecular modeling techniques through Molecular Docking and Molecular Dynamics were used, where the target was the external structures of the virus, but specifically the envelope protein, main protease, and Spike glycoprotein proteases. Using the g_MM-GBSA module and with it, the molecular docking studies show that the ligands have interaction characteristics capable of adsorbing the structures. Molecular dynamics provided information on the root-mean-square deviation of the atomic positions provided values between 1 and 2.5. The generalized Born implicit solvation model, Gibbs free energy, and solvent accessible surface area approach were used. Among the results obtained through molecular dynamics, it was noticed that interactions occur since Pc could bind to residues of the active site of macromolecules, demonstrating good interactions; in particular with CoPc. Molecular couplings and free energy showed that S-gly active site residues interacted strongly with phthalocyanines with values of - 182.443 kJ/mol (CoPc), 158.954 kJ/mol (CuPc), and - 129.963 kJ/mol (NoPc). The interactions of Pc's with SARS-CoV-2 may predict some promising candidates for antagonists to the virus, which if confirmed through experimental approaches, may contribute to resolving the global crisis of the COVID-19 pandemic.
Collapse
Affiliation(s)
- Wilson Luna Machado Alencar
- Laboratory of Preparation and Computation of Nanomaterials (LPCN), Federal University of Pará, C. P. 479, Belem, PA, 66075-110, Brazil
- Pos-Graduation Program in Engineering of Natural Resources of the Amazon, ITEC, Federal University of Pará, C. P. 2626, Belém, PA, 66050-540, Brazil
- Federal Institute of Pará (IFPA), C. P. BR 316, Km 61, Castanhal, PA, 68740-970, Brazil
| | - Tiago da Silva Arouche
- Laboratory of Preparation and Computation of Nanomaterials (LPCN), Federal University of Pará, C. P. 479, Belem, PA, 66075-110, Brazil
| | | | | | - Raul Nunes de Carvalho Júnior
- Pos-Graduation Program in Engineering of Natural Resources of the Amazon, ITEC, Federal University of Pará, C. P. 2626, Belém, PA, 66050-540, Brazil
- Pos-Graduation Program in Chemical Engineering, ITEC, Federal University of Pará, C. P. 479, Belém, PA, 66075-900, Brazil
| | - Antonio Maia de Jesus Chaves Neto
- Laboratory of Preparation and Computation of Nanomaterials (LPCN), Federal University of Pará, C. P. 479, Belem, PA, 66075-110, Brazil.
- Pos-Graduation Program in Engineering of Natural Resources of the Amazon, ITEC, Federal University of Pará, C. P. 2626, Belém, PA, 66050-540, Brazil.
- Pos-Graduation Program in Chemical Engineering, ITEC, Federal University of Pará, C. P. 479, Belém, PA, 66075-900, Brazil.
- National Professional Master's in Physics Teaching, Federal University of Pará, C. P. 479, Belém, PA, 66075-110, Brazil.
| |
Collapse
|
65
|
Wang P, Zheng S, Jiang Y, Li C, Liu J, Wen C, Patronov A, Qian D, Chen H, Yang Y. Structure-Aware Multimodal Deep Learning for Drug-Protein Interaction Prediction. J Chem Inf Model 2022; 62:1308-1317. [PMID: 35200015 DOI: 10.1021/acs.jcim.2c00060] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Identifying drug-protein interactions (DPIs) is crucial in drug discovery, and a number of machine learning methods have been developed to predict DPIs. Existing methods usually use unrealistic data sets with hidden bias, which will limit the accuracy of virtual screening methods. Meanwhile, most DPI prediction methods pay more attention to molecular representation but lack effective research on protein representation and high-level associations between different instances. To this end, we present the novel structure-aware multimodal deep DPI prediction model, STAMP-DPI, which was trained on a curated industry-scale benchmark data set. We built a high-quality benchmark data set named GalaxyDB for DPI prediction. This industry-scale data set along with an unbiased training procedure resulted in a more robust benchmark study. For informative protein representation, we constructed a structure-aware graph neural network method from the protein sequence by combining predicted contact maps and graph neural networks. Through further integration of structure-based representation and high-level pretrained embeddings for molecules and proteins, our model effectively captures the feature representation of the interactions between them. As a result, STAMP-DPI outperformed state-of-the-art DPI prediction methods by decreasing 7.00% mean square error (MSE) in the Davis data set and improving 8.89% area under the curve (AUC) in the GalaxyDB data set. Moreover, our model is an interpretable model with the transformer-based interaction mechanism, which can accurately reveal the binding sites between molecules and proteins.
Collapse
Affiliation(s)
- Penglei Wang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shuangjia Zheng
- School of Data and Computer Science, Sun Yat-Sen Universit, Guangzhou 510275, China
| | | | | | | | - Chang Wen
- Guangzhou Laboratory, Guangzhou 510000, China
| | - Atanas Patronov
- MolecularAI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg 405 30, Sweden
| | - Dahong Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | | | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen Universit, Guangzhou 510275, China
| |
Collapse
|
66
|
Maljković MM, Mitić NS, de Brevern AG. Prediction of structural alphabet protein blocks using data mining. Biochimie 2022; 197:74-85. [PMID: 35143919 DOI: 10.1016/j.biochi.2022.01.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 01/22/2022] [Accepted: 01/31/2022] [Indexed: 11/17/2022]
Abstract
3D protein structures determine proteins' biological functions. The 3D structure of the protein backbone can be approximated using the prototypes of local protein conformations. Sets of these prototypes are called structural alphabets (SAs). Amongst several approaches to the prediction of 3D structures from amino acid sequences, one approach is based on the prediction of SA prototypes for a given amino acid sequence. Protein Blocks (PBs) is the most known SA, and it is composed of 16 prototypes of five consecutive amino acids which were identified as optimal prototypes considering the ability to correctly approximate the local structure and the prediction accuracy of prototypes from an amino acid sequence. We developed models for PBs prediction from sequence information using different data mining approaches and machine learning algorithms. Besides the amino acid sequences, the results of the following tools were used to train the models: the Spider3 predictor of protein structure properties, several predictors of the protein's intrinsically disordered regions, and a tool for finding repeats in amino acid sequences. The highest accuracy of the constructed models is 80%, which is a significant improvement compared to the previous best available prediction, whose accuracy was 61%. Analyzing the models constructed by applying different algorithms, it was noticed that the significance of input attributes differs among the models constructed by algorithms. Using the information about amino acids belonging to intrinsically disordered regions and repeats improves the precision of prediction for some PBs using the CART classification algorithm, while this is not the case with the C5.0 classification algorithm. Improved prediction approaches can have interesting applications in protein structural model approaches or computational protein design.
Collapse
Affiliation(s)
- Mirjana M Maljković
- Faculty of Mathematics, University of Belgrade, Studentski Trg 16, 11000, Belgrade, Serbia.
| | - Nenad S Mitić
- Faculty of Mathematics, University of Belgrade, Studentski Trg 16, 11000, Belgrade, Serbia
| | - Alexandre G de Brevern
- Université de Paris, INSERM UMR_S 1134, DSIMB, Université de la Réunion, INTS6, Rue Alexandre Cabanel, 75015, Paris, France
| |
Collapse
|
67
|
Protein secondary structure prediction using a lightweight convolutional network and label distribution aware margin loss. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
68
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
69
|
Mahbub S, Bayzid MS. EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. Brief Bioinform 2022; 23:6518045. [PMID: 35106547 DOI: 10.1093/bib/bbab578] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 11/25/2021] [Accepted: 12/16/2021] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are central to most biological processes. However, reliable identification of PPI sites using conventional experimental methods is slow and expensive. Therefore, great efforts are being put into computational methods to identify PPI sites. RESULTS We present Edge Aggregated GRaph Attention NETwork (EGRET), a highly accurate deep learning-based method for PPI site prediction, where we have used an edge aggregated graph attention network to effectively leverage the structural information. We, for the first time, have used transfer learning in PPI site prediction. Our proposed edge aggregated network, together with transfer learning, has achieved notable improvement over the best alternate methods. Furthermore, we systematically investigated EGRET's network behavior to provide insights about the causes of its decisions. AVAILABILITY EGRET is freely available as an open source project at https://github.com/Sazan-Mahbub/EGRET. CONTACT shams_bayzid@cse.buet.ac.bd.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science University of Maryland, College Park, Maryland 20742, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
70
|
A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence. Amino Acids 2022; 54:765-776. [DOI: 10.1007/s00726-022-03129-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 01/17/2022] [Indexed: 11/26/2022]
|
71
|
Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022; 23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open
Abstract
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Because of the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RNA-binding protein-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Siyuan Chen
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Hi-Tech Park, 518057,
Shenzhen, China
| |
Collapse
|
72
|
Abstract
Protein secondary structure prediction is an important topic in bioinformatics. This paper proposed a novel model named WS-BiLSTM, which combined the wavelet scattering convolutional network and the long-short-term memory network for the first time to predict protein secondary structure. This model captures nonlocal interactions between amino acid sequences and remembers long-range interactions between amino acids. In our WS-BiLSTM model, the wavelet scattering convolutional network is used to extract protein features from the PSSM sliding window; the extracted features are combined with the original PSSM data as the input features of the long-short-term memory network to predict protein secondary structure. It is worth noting that the wavelet scattering convolutional network is asymmetric as a member of the continuous wavelet family. The Q3 accuracy on the test set CASP9, CASP10, CASP11, CASP12, CB513, and PDB25 reached 85.26%, 85.84%, 84.91%, 85.13%, 86.10%, and 85.52%, which were higher 2.15%, 2.16%, 3.5%, 3.19%, 4.22%, and 2.75%, respectively, than using the long-short-term memory network alone. Comparing our results with the state-of-art methods shows that our proposed model achieved better results on the CB513 and CASP12 data sets. The experimental results show that the features extracted from the wavelet scattering convolutional network can effectively improve the accuracy of protein secondary structure prediction.
Collapse
|
73
|
Newton MAH, Mataeimoghadam F, Zaman R, Sattar A. Secondary structure specific simpler prediction models for protein backbone angles. BMC Bioinformatics 2022; 23:6. [PMID: 34983370 PMCID: PMC8728911 DOI: 10.1186/s12859-021-04525-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 12/07/2021] [Indexed: 11/10/2022] Open
Abstract
Motivation Protein backbone angle prediction has achieved significant accuracy improvement with the development of deep learning methods. Usually the same deep learning model is used in making prediction for all residues regardless of the categories of secondary structures they belong to. In this paper, we propose to train separate deep learning models for each category of secondary structures. Machine learning methods strive to achieve generality over the training examples and consequently loose accuracy. In this work, we explicitly exploit classification knowledge to restrict generalisation within the specific class of training examples. This is to compensate the loss of generalisation by exploiting specialisation knowledge in an informed way. Results The new method named SAP4SS obtains mean absolute error (MAE) values of 15.59, 18.87, 6.03, and 21.71 respectively for four types of backbone angles \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\phi$$\end{document}ϕ, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\psi$$\end{document}ψ, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\theta$$\end{document}θ, and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\tau$$\end{document}τ. Consequently, SAP4SS significantly outperforms existing state-of-the-art methods SAP, OPUS-TASS, and SPOT-1D: the differences in MAE for all four types of angles are from 1.5 to 4.1% compared to the best known results. Availability SAP4SS along with its data is available from https://gitlab.com/mahnewton/sap4ss.
Collapse
Affiliation(s)
- M A Hakim Newton
- School of Information and Communication Technology, Griffith University, Brisbane, Australia. .,Institute of Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
| | | | - Rianon Zaman
- School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Brisbane, Australia.,Institute of Integrated and Intelligent Systems, Griffith University, Brisbane, Australia
| |
Collapse
|
74
|
Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022; 23:40-55. [PMID: 34518686 DOI: 10.1038/s41580-021-00407-0] [Citation(s) in RCA: 790] [Impact Index Per Article: 263.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 02/08/2023]
Abstract
The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, London, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, UK
| | - Lewis Moffat
- Department of Computer Science, University College London, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.
| |
Collapse
|
75
|
Taherzadeh G, Campbell M, Zhou Y. Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. Methods Mol Biol 2022; 2499:177-186. [PMID: 35696081 DOI: 10.1007/978-1-0716-2317-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew's correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/ .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- Department of Mathematics and Computer Science, Wilkes University, Wilkes-Barre, PA, USA.
| | - Matthew Campbell
- Institute for Glycomics, Griffith University, Southport, QLD, Australia
| | - Yaoqi Zhou
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China
| |
Collapse
|
76
|
Dehzangi I, Sharma A, Shatabda S. iProtGly-SS: A Tool to Accurately Predict Protein Glycation Site Using Structural-Based Features. Methods Mol Biol 2022; 2499:125-134. [PMID: 35696077 DOI: 10.1007/978-1-0716-2317-6_5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Posttranslational modification (PTM) is an important biological mechanism to promote functional diversity among the proteins. So far, a wide range of PTMs has been identified. Among them, glycation is considered as one of the most important PTMs. Glycation is associated with different neurological disorders including Parkinson and Alzheimer. It is also shown to be responsible for different diseases, including vascular complications of diabetes mellitus. Despite all the efforts have been made so far, the prediction performance of glycation sites using computational methods remains limited. Here we present a newly developed machine learning tool called iProtGly-SS that utilizes sequential and structural information as well as Support Vector Machine (SVM) classifier to enhance lysine glycation site prediction accuracy. The performance of iProtGly-SS was investigated using the three most popular benchmarks used for this task. Our results demonstrate that iProtGly-SS is able to achieve 81.61%, 93.62%, and 92.95% prediction accuracies on these benchmarks, which are significantly better than those results reported in the previous studies. iProtGly-SS is implemented as a web-based tool which is publicly available at http://brl.uiu.ac.bd/iprotgly-ss/ .
Collapse
Affiliation(s)
- Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA.
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia.
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| |
Collapse
|
77
|
Liang S, Li Z, Zhan J, Zhou Y. De novo protein design by an energy function based on series expansion in distance and orientation dependence. Bioinformatics 2021; 38:86-93. [PMID: 34406339 DOI: 10.1093/bioinformatics/btab598] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 08/11/2021] [Accepted: 08/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Despite many successes, de novo protein design is not yet a solved problem as its success rate remains low. The low success rate is largely because we do not yet have an accurate energy function for describing the solvent-mediated interaction between amino acid residues in a protein chain. Previous studies showed that an energy function based on series expansions with its parameters optimized for side-chain and loop conformations can lead to one of the most accurate methods for side chain (OSCAR) and loop prediction (LEAP). Following the same strategy, we developed an energy function based on series expansions with the parameters optimized in four separate stages (recovering single-residue types without and with orientation dependence, selecting loop decoys and maintaining the composition of amino acids). We tested the energy function for de novo design by using Monte Carlo simulated annealing. RESULTS The method for protein design (OSCAR-Design) is found to be as accurate as OSCAR and LEAP for side-chain and loop prediction, respectively. In de novo design, it can recover native residue types ranging from 38% to 43% depending on test sets, conserve hydrophobic/hydrophilic residues at ∼75%, and yield the overall similarity in amino acid compositions at more than 90%. These performance measures are all statistically significantly better than several protein design programs compared. Moreover, the largest hydrophobic patch areas in designed proteins are near or smaller than those in native proteins. Thus, an energy function based on series expansion can be made useful for protein design. AVAILABILITY AND IMPLEMENTATION The Linux executable version is freely available for academic users at http://zhouyq-lab.szbl.ac.cn/resources/.
Collapse
Affiliation(s)
- Shide Liang
- Department of R & D, Bio-Thera Solutions, Guangzhou 510530, China
| | - Zhixiu Li
- Institute of Health and Biomedical Innovation, Queensland University of Technology at Translational Research Institute, Woolloongabba, QLD 3001, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Southport, QLD 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yaoqi Zhou
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China.,Peking University Shenzhen Graduate School, Shenzhen 518055, China
| |
Collapse
|
78
|
Li TJ, Wen BY, Ma XH, Huang WT, Wu JZ, Lin XM, Zhang YJ, Li JF. Rapid and Simple Analysis of the Human Pepsin Secondary Structure Using a Portable Raman Spectrometer. Anal Chem 2021; 94:1318-1324. [PMID: 34928126 DOI: 10.1021/acs.analchem.1c04531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Human pepsin is a digestive protease that plays an important role in the human digestive system. The secondary structure of human pepsin determines its bioactivity. Therefore, an in-depth understanding of human pepsin secondary structure changes is particularly important for the further improvement of the efficiency of human pepsin biological function. However, the complexity and diversity of the human pepsin secondary structure make its analysis difficult. Herein, a convenient method has been developed to quickly detect the secondary structure of human pepsin using a portable Raman spectrometer. According to the change of surface-enhanced Raman spectroscopy (SERS) signal intensity and activity of human pepsin at different pH values, we analyze the change of the human pepsin secondary structure. The results show that the content of the β-sheet gradually increased with the increase in the pH in the active range, which is in good agreement with circular dichroism (CD) measurements. The change of the secondary structure improves the sensitivity of human pepsin SERS detection. Meanwhile, human pepsin is a commonly used disease marker for the noninvasive diagnosis of gastroesophageal reflux disease (GERD); the detection limit of human pepsin we obtained is 2 μg/mL by the abovementioned method. The real clinical detection scenario is also simulated by spiking pepsin solution in saliva, and the standard recovery rate is 80.7-92.3%. These results show the great prospect of our method in studying the protein secondary structure and furthermore promote the application of SERS in clinical diagnosis.
Collapse
Affiliation(s)
- Tong-Jiang Li
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Bao-Ying Wen
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Xiao-Hui Ma
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Wan-Ting Huang
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Jin-Zhun Wu
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Xiu-Mei Lin
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Yue-Jiao Zhang
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| | - Jian-Feng Li
- Women and Children's Hospital Affiliated to Xiamen University, School of medicine, College of Chemistry and Chemical Engineering, College of Energy, Xiamen University, Xiamen 361005, China
| |
Collapse
|
79
|
Pakhrin SC, Aoki-Kinoshita KF, Caragea D, KC DB. DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction. Molecules 2021; 26:molecules26237314. [PMID: 34885895 PMCID: PMC8658957 DOI: 10.3390/molecules26237314] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- School of Computing, Wichita State University, 1845 Fairmount St., Wichita, KS 67260, USA;
| | | | - Doina Caragea
- Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA;
| | - Dukka B. KC
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA
- Correspondence: ; Tel.: +1-906-487-1657
| |
Collapse
|
80
|
Mulnaes D, Schott-Verdugo S, Koenig F, Gohlke H. TopProperty: Robust Metaprediction of Transmembrane and Globular Protein Features Using Deep Neural Networks. J Chem Theory Comput 2021; 17:7281-7289. [PMID: 34663069 DOI: 10.1021/acs.jctc.1c00685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Transmembrane proteins (TMPs) are critical components of cellular life. However, due to experimental challenges, the number of experimentally resolved TMP structures is severely underrepresented in databases compared to their cellular abundance. Prediction of (per-residue) features such as transmembrane topology, membrane exposure, secondary structure, and solvent accessibility can be a useful starting point for experimental design or protein structure prediction but often requires different computational tools for different features or types of proteins. We present TopProperty, a metapredictor that predicts all of these features for TMPs or globular proteins. TopProperty is trained on datasets without bias toward a high number of sequence homologs, and the predictions are significantly better than the evaluated state-of-the-art primary predictors on all quality metrics. TopProperty eliminates the need for protein type- or feature-tailored tools, specifically for TMPs. TopProperty is freely available as a web server and standalone at https://cpclab.uni-duesseldorf.de/topsuite/.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf 40225, Germany
| | - Stephan Schott-Verdugo
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Bioinformatics), and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, Wilhelm-Johnen-Str., Jülich 52425, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf 40225, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf 40225, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Bioinformatics), and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, Wilhelm-Johnen-Str., Jülich 52425, Germany
| |
Collapse
|
81
|
Ho CT, Huang YW, Chen TR, Lo CH, Lo WC. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021; 11:1627. [PMID: 34827624 PMCID: PMC8615938 DOI: 10.3390/biom11111627] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 12/29/2022] Open
Abstract
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Collapse
Affiliation(s)
- Chia-Tzu Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Chia-Hua Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
82
|
Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021; 14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]
Abstract
Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.
Collapse
Affiliation(s)
- Yan Zhang
- College of Mechanical and Electrical Engineering, Qingdao University of Science and Technology, Qingdao, 266061, China.,College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Cheng Chen
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Qinqin Wei
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
83
|
Cook AD, Roberts AJ, Atherton J, Tewari R, Topf M, Moores CA. Cryo-EM structure of a microtubule-bound parasite kinesin motor and implications for its mechanism and inhibition. J Biol Chem 2021; 297:101063. [PMID: 34375637 PMCID: PMC8526983 DOI: 10.1016/j.jbc.2021.101063] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 07/23/2021] [Accepted: 08/05/2021] [Indexed: 11/25/2022] Open
Abstract
Plasmodium parasites cause malaria and are responsible annually for hundreds of thousands of deaths. Kinesins are a superfamily of microtubule-dependent ATPases that play important roles in the parasite replicative machinery, which is a potential target for antiparasite drugs. Kinesin-5, a molecular motor that cross-links microtubules, is an established antimitotic target in other disease contexts, but its mechanism in Plasmodium falciparum is unclear. Here, we characterized P. falciparum kinesin-5 (PfK5) using cryo-EM to determine the motor's nucleotide-dependent microtubule-bound structure and introduced 3D classification of individual motors into our microtubule image processing pipeline to maximize our structural insights. Despite sequence divergence in PfK5, the motor exhibits classical kinesin mechanochemistry, including ATP-induced subdomain rearrangement and cover neck bundle formation, consistent with its plus-ended directed motility. We also observed that an insertion in loop5 of the PfK5 motor domain creates a different environment in the well-characterized human kinesin-5 drug-binding site. Our data reveal the possibility for selective inhibition of PfK5 and can be used to inform future exploration of Plasmodium kinesins as antiparasite targets.
Collapse
Affiliation(s)
- Alexander D Cook
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, London, United Kingdom
| | - Anthony J Roberts
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, London, United Kingdom
| | - Joseph Atherton
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, London, United Kingdom
| | - Rita Tewari
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Maya Topf
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, London, United Kingdom
| | - Carolyn A Moores
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, London, United Kingdom.
| |
Collapse
|
84
|
Accurate prediction of protein torsion angles using evolutionary signatures and recurrent neural network. Sci Rep 2021; 11:21033. [PMID: 34702851 PMCID: PMC8548351 DOI: 10.1038/s41598-021-00477-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 09/27/2021] [Indexed: 11/08/2022] Open
Abstract
The amino acid sequence of a protein contains all the necessary information to specify its shape, which dictates its biological activities. However, it is challenging and expensive to experimentally determine the three-dimensional structure of proteins. The backbone torsion angles play a critical role in protein structure prediction, and accurately predicting the angles can considerably advance the tertiary structure prediction by accelerating efficient sampling of the large conformational space for low energy structures. Here we first time propose evolutionary signatures computed from protein sequence profiles, and a novel recurrent architecture, termed ESIDEN, that adopts a straightforward architecture of recurrent neural networks with a small number of learnable parameters. The ESIDEN can capture efficient information from both the classic and new features benefiting from different recurrent architectures in processing information. On the other hand, compared to widely used classic features, the new features, especially the Ramachandran basin potential, provide statistical and evolutionary information to improve prediction accuracy. On four widely used benchmark datasets, the ESIDEN significantly improves the accuracy in predicting the torsion angles by comparison to the best-so-far methods. As demonstrated in the present study, the predicted angles can be used as structural constraints to accurately infer protein tertiary structures. Moreover, the proposed features would pave the way to improve machine learning-based methods in protein folding and structure prediction, as well as function prediction. The source code and data are available at the website https://kornmann.bioch.ox.ac.uk/leri/resources/download.html .
Collapse
|
85
|
Wang H, Zhao J, Zhao H, Li H, Wang J. CL-ACP: a parallel combination of CNN and LSTM anticancer peptide recognition model. BMC Bioinformatics 2021; 22:512. [PMID: 34670488 PMCID: PMC8527680 DOI: 10.1186/s12859-021-04433-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 10/05/2021] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Anticancer peptides are defence substances with innate immune functions that can selectively act on cancer cells without harming normal cells and many studies have been conducted to identify anticancer peptides. In this paper, we introduce the anticancer peptide secondary structures as additional features and propose an effective computational model, CL-ACP, that uses a combined network and attention mechanism to predict anticancer peptides. RESULTS The CL-ACP model uses secondary structures and original sequences of anticancer peptides to construct the feature space. The long short-term memory and convolutional neural network are used to extract the contextual dependence and local correlations of the feature space. Furthermore, a multi-head self-attention mechanism is used to strengthen the anticancer peptide sequences. Finally, three categories of feature information are classified by cascading. CL-ACP was validated using two types of datasets, anticancer peptide datasets and antimicrobial peptide datasets, on which it achieved good results compared to previous methods. CL-ACP achieved the highest AUC values of 0.935 and 0.972 on the anticancer peptide and antimicrobial peptide datasets, respectively. CONCLUSIONS CL-ACP can effectively recognize antimicrobial peptides, especially anticancer peptides, and the parallel combined neural network structure of CL-ACP does not require complex feature design and high time cost. It is suitable for application as a useful tool in antimicrobial peptide design.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jian Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China.
| | - Hong Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Haolin Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Juan Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| |
Collapse
|
86
|
Wong SWK, Liu Z. Conformational variability of loops in the SARS-CoV-2 spike protein. Proteins 2021; 90:691-703. [PMID: 34661307 PMCID: PMC8662175 DOI: 10.1002/prot.26266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 10/05/2021] [Accepted: 10/12/2021] [Indexed: 11/07/2022]
Abstract
The SARS‐CoV‐2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This article identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank structures. While most loops had essentially one stable conformation, 17 of 44 loop regions were observed to be structurally variable with multiple substantively distinct conformations based on a cluster analysis. Loop modeling methods were then applied to the S protein loop targets, and the prediction accuracies discussed in relation to the characteristics of the conformational clusters identified. Loops with multiple conformations were found to be challenging to model based on a single structural template.
Collapse
Affiliation(s)
- Samuel W. K. Wong
- Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterlooCanada
| | - Zongjun Liu
- Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterlooCanada
| |
Collapse
|
87
|
Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework. Anal Biochem 2021; 631:114358. [PMID: 34478704 DOI: 10.1016/j.ab.2021.114358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 11/20/2022]
Abstract
The accurate prediction of the relative solvent accessibility of a protein is critical to understanding its 3D structure and biological function. In this study, a novel deep multi-view feature learning (DMVFL) framework that integrates three different neural network units, i.e., bidirectional long short-term memory recurrent neural network, squeeze-and-excitation, and fully-connected hidden layer, with four sequence-based single-view features, i.e., position-specific scoring matrix, position-specific frequency matrix, predicted secondary structure, and roughly predicted three-state relative solvent accessibility probability, is developed to accurately predict relative solvent accessibility information of protein. On the basis of this newly developed framework, one new protein relative solvent accessibility predictor was proposed and called DMVFL-RSA, which employs a customized multiple feedback mechanism that helps to extract discriminative information embedded in the four single-view features. In benchmark tests on TEST524 and CASP14-derived (CASP14set) datasets, DMVFL-RSA outperforms other existing state-of-the-art protein relative solvent accessibility predictors when predicting two-state (exposure threshold of 25%), three-state (exposure thresholds of 9% and 36%), and four-state (exposure thresholds of 4%, 25%, and 50%) discrete values. For real-valued prediction on TEST524 and CASP14set, DMVFL-RSA has also gained high Pearson correlation coefficient values, indicating a positive correlation between the predicted and native relative solvent accessibility. Detailed analyses show that the major advantages of DMVFL-RSA lie in the high efficiency of the DMVFL framework, the applied multiple feedback mechanism, and the strong sensitivity of the sequence-based features. The web server of DMVFL-RSA is freely available at https://jun-csbio.github.io/DMVFL-RSA/for academic use. The standalone package of DMVFL-RSA is downloadable at https://github.com/XueQiangFan/DMVFL-RSA.
Collapse
|
88
|
Taujale R, Zhou Z, Yeung W, Moremen KW, Li S, Kannan N. Mapping the glycosyltransferase fold landscape using interpretable deep learning. Nat Commun 2021; 12:5656. [PMID: 34580305 PMCID: PMC8476585 DOI: 10.1038/s41467-021-25975-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/31/2021] [Indexed: 12/28/2022] Open
Abstract
Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.
Collapse
Affiliation(s)
- Rahil Taujale
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Zhongliang Zhou
- Department of Computer Science, University of Georgia, Athens, GA, USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Kelley W Moremen
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
- Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Sheng Li
- Department of Computer Science, University of Georgia, Athens, GA, USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
- Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA.
| |
Collapse
|
89
|
Griffith D, Holehouse AS. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets. eLife 2021; 10:e70576. [PMID: 34533455 PMCID: PMC8448528 DOI: 10.7554/elife.70576] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022] Open
Abstract
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
Collapse
Affiliation(s)
- Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| |
Collapse
|
90
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
91
|
Xu G, Wang Q, Ma J. OPUS-X: an open-source toolkit for protein torsion angles, secondary structure, solvent accessibility, contact map predictions and 3D folding. Bioinformatics 2021; 38:108-114. [PMID: 34478500 PMCID: PMC8696105 DOI: 10.1093/bioinformatics/btab633] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 07/09/2021] [Accepted: 09/01/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The development of an open-source platform to predict protein 1D features and 3D structure is an important task. In this paper, we report an open-source toolkit for protein 3D structure modeling, named OPUS-X. It contains three modules: OPUS-TASS2, which predicts protein torsion angles, secondary structure and solvent accessibility; OPUS-Contact, which measures the distance and orientation information between different residue pairs; and OPUS-Fold2, which uses the constraints derived from the first two modules to guide folding. RESULTS OPUS-TASS2 is an upgraded version of our previous method OPUS-TASS. OPUS-TASS2 integrates protein global structure information and significantly outperforms OPUS-TASS. OPUS-Contact combines multiple raw co-evolutionary features with protein 1D features predicted by OPUS-TASS2, and delivers better results than the open-source state-of-the-art method trRosetta. OPUS-Fold2 is a complementary version of our previous method OPUS-Fold. OPUS-Fold2 is a gradient-based protein folding framework based on the differentiable energy terms in opposed to OPUS-Fold that is a sampling-based method used to deal with the non-differentiable terms. OPUS-Fold2 exhibits comparable performance to the Rosetta folding protocol in trRosetta when using identical inputs. OPUS-Fold2 is written in Python and TensorFlow2.4, which is user-friendly to any source-code-level modification. AVAILABILITYAND IMPLEMENTATION The code and pre-trained models of OPUS-X can be downloaded from https://github.com/OPUS-MaLab/opus_x. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China,Shanghai AI Laboratory, Shanghai 200030, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
92
|
Akbar S, Pardasani KR, Panda NR. PSO Based Neuro-fuzzy Model for Secondary Structure Prediction of Protein. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10615-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
93
|
Chen C, Shi H, Jiang Z, Salhi A, Chen R, Cui X, Yu B. DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network. Comput Biol Med 2021; 136:104676. [PMID: 34375902 DOI: 10.1016/j.compbiomed.2021.104676] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/18/2021] [Accepted: 07/19/2021] [Indexed: 02/03/2023]
Abstract
Analysis and prediction of drug-target interactions (DTIs) play an important role in understanding drug mechanisms, as well as drug repositioning and design. Machine learning (ML)-based methods for DTIs prediction can mitigate the shortcomings of time-consuming and labor-intensive experimental approaches, while providing new ideas and insights for drug design. We propose a novel pipeline for predicting drug-target interactions, called DNN-DTIs. First, the target information is characterized by a number of features, namely, pseudo-amino acid composition, pseudo position-specific scoring matrix, conjoint triad composition, transition and distribution, Moreau-Broto autocorrelation, and structural features. The drug compounds are subsequently encoded using substructure fingerprints. Next, eXtreme gradient boosting (XGBoost) is used to determine the subset of non-redundant features of importance. The optimal balanced set of sample vectors is obtained by applying the synthetic minority oversampling technique (SMOTE). Finally, a DTIs predictor, DNN-DTIs, is developed based on a deep neural network (DNN) via a layer-by-layer learning scheme. Experimental results indicate that DNN-DTIs achieves better performance than other state-of-the-art predictors with ACC values of 98.78%, 98.60%, 97.98%, 98.24% and 98.00% on Enzyme, Ion Channels (IC), GPCR, Nuclear Receptors (NR) and Kuang's datasets. Therefore, the accurate prediction performance of DNN-DTIs makes it a favored choice for contributing to the study of DTIs, especially drug repositioning.
Collapse
Affiliation(s)
- Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Han Shi
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Adil Salhi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Ruixin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
94
|
Chen TR, Juan SH, Huang YW, Lin YC, Lo WC. A secondary structure-based position-specific scoring matrix applied to the improvement in protein secondary structure prediction. PLoS One 2021; 16:e0255076. [PMID: 34320027 PMCID: PMC8318245 DOI: 10.1371/journal.pone.0255076] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/11/2021] [Indexed: 11/18/2022] Open
Abstract
Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.
Collapse
Affiliation(s)
- Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Sheng-Hung Juan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yen-Cheng Lin
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
95
|
Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021; 17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Pegah Golchin
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
96
|
Chen Z, Zhao P, Li C, Li F, Xiang D, Chen YZ, Akutsu T, Daly RJ, Webb GI, Zhao Q, Kurgan L, Song J. iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res 2021; 49:e60. [PMID: 33660783 PMCID: PMC8191785 DOI: 10.1093/nar/gkab122] [Citation(s) in RCA: 157] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 02/05/2021] [Accepted: 02/25/2021] [Indexed: 12/14/2022] Open
Abstract
Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.
Collapse
Affiliation(s)
- Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Dongxu Xiang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Yong-Zi Chen
- Laboratory of Tumor Cell Biology, Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300060, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Roger J Daly
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Quanzhi Zhao
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China.,Key Laboratory of Rice Biology in Henan Province, Henan Agricultural University, Zhengzhou 450046, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
97
|
Lyu Z, Wang Z, Luo F, Shuai J, Huang Y. Protein Secondary Structure Prediction With a Reductive Deep Learning Method. Front Bioeng Biotechnol 2021; 9:687426. [PMID: 34211967 PMCID: PMC8240957 DOI: 10.3389/fbioe.2021.687426] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 04/26/2021] [Indexed: 12/12/2022] Open
Abstract
Protein secondary structures have been identified as the links in the physical processes of primary sequences, typically random coils, folding into functional tertiary structures that enable proteins to involve a variety of biological events in life science. Therefore, an efficient protein secondary structure predictor is of importance especially when the structure of an amino acid sequence fragment is not solved by high-resolution experiments, such as X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance spectroscopy, which are usually time consuming and expensive. In this paper, a reductive deep learning model MLPRNN has been proposed to predict either 3-state or 8-state protein secondary structures. The prediction accuracy by the MLPRNN on the publicly available benchmark CB513 data set is comparable with those by other state-of-the-art models. More importantly, taking into account the reductive architecture, MLPRNN could be a baseline for future developments.
Collapse
Affiliation(s)
- Zhiliang Lyu
- College of Computer Engineering, Jimei University, Xiamen, China
| | - Zhijin Wang
- College of Computer Engineering, Jimei University, Xiamen, China
| | - Fangfang Luo
- College of Computer Engineering, Jimei University, Xiamen, China
| | - Jianwei Shuai
- Department of Physics and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, and State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| | - Yandong Huang
- College of Computer Engineering, Jimei University, Xiamen, China
| |
Collapse
|
98
|
Wang H, Zhao H, Yan Z, Zhao J, Han J. MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network. Biomolecules 2021; 11:biom11060872. [PMID: 34208298 PMCID: PMC8231176 DOI: 10.3390/biom11060872] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 05/30/2021] [Accepted: 06/07/2021] [Indexed: 12/26/2022] Open
Abstract
Lysine succinylation is an important post-translational modification, whose abnormalities are closely related to the occurrence and development of many diseases. Therefore, exploring effective methods to identify succinylation sites is helpful for disease treatment and research of related drugs. However, most existing computational methods for the prediction of succinylation sites are still based on machine learning. With the increasing volume of data and complexity of feature representations, it is necessary to explore effective deep learning methods to recognize succinylation sites. In this paper, we propose a multilane dense convolutional attention network, MDCAN-Lys. MDCAN-Lys extracts sequence information, physicochemical properties of amino acids, and structural properties of proteins using a three-way network, and it constructs feature space. For each sub-network, MDCAN-Lys uses the cascading model of dense convolutional block and convolutional block attention module to capture feature information at different levels and improve the abstraction ability of the network. The experimental results of 10-fold cross-validation and independent testing show that MDCAN-Lys can recognize more succinylation sites, which is consistent with the conclusion of the case study. Thus, it is worthwhile to explore deep learning-based methods for the recognition of succinylation sites.
Collapse
|
99
|
Liu Y, Gong W, Zhao Y, Deng X, Zhang S, Li C. aPRBind: protein-RNA interface prediction by combining sequence and I-TASSER model-based structural features learned with convolutional neural networks. Bioinformatics 2021; 37:937-942. [PMID: 32821925 DOI: 10.1093/bioinformatics/btaa747] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 07/26/2020] [Accepted: 08/17/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Protein-RNA interactions play a critical role in various biological processes. The accurate prediction of RNA-binding residues in proteins has been one of the most challenging and intriguing problems in the field of computational biology. The existing methods still have a relatively low accuracy especially for the sequence-based ab-initio methods. RESULTS In this work, we propose an approach aPRBind, a convolutional neural network-based ab-initio method for RNA-binding residue prediction. aPRBind is trained with sequence features and structural ones (particularly including residue dynamics information and residue-nucleotide propensity developed by us) that are extracted from the predicted structures by I-TASSER. The analysis of feature contributions indicates the sequence features are most important, followed by dynamics information, and the sequence and structural features are complementary in binding site prediction. The performance comparison of our method with other peer ones on benchmark dataset shows that aPRBind outperforms some state-of-the-art ab-initio methods. Additionally, aPRBind can give a better prediction for the modeled structures with TM-score≥0.5, and meanwhile since the structural features are not very sensitive to the refined 3D structures, aPRBind has only a marginal dependence on the accuracy of the structure model, which allows aPRBind to be applied to the RNA-binding site prediction for the modeled or unbound structures. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/ChunhuaLiLab/aPRbind. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Liu
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Weikang Gong
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Yanpeng Zhao
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Xueqing Deng
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Shan Zhang
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- Department of Biomedical Engineering, Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
100
|
Mauri T, Menu-Bouaouiche L, Bardor M, Lefebvre T, Lensink MF, Brysbaert G. O-GlcNAcylation Prediction: An Unattained Objective. Adv Appl Bioinform Chem 2021; 14:87-102. [PMID: 34135600 PMCID: PMC8197665 DOI: 10.2147/aabc.s294867] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 04/28/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND O-GlcNAcylation is an essential post-translational modification (PTM) in mammalian cells. It consists in the addition of a N-acetylglucosamine (GlcNAc) residue onto serines or threonines by an O-GlcNAc transferase (OGT). Inhibition of OGT is lethal, and misregulation of this PTM can lead to diverse pathologies including diabetes, Alzheimer's disease and cancers. Knowing the location of O-GlcNAcylation sites and the ability to accurately predict them is therefore of prime importance to a better understanding of this process and its related pathologies. PURPOSE Here, we present an evaluation of the current predictors of O-GlcNAcylation sites based on a newly built dataset and an investigation to improve predictions. METHODS Several datasets of experimentally proven O-GlcNAcylated sites were combined, and the resulting meta-dataset was used to evaluate three prediction tools. We further defined a set of new features following the analysis of the primary to tertiary structures of experimentally proven O-GlcNAcylated sites in order to improve predictions by the use of different types of machine learning techniques. RESULTS Our results show the failure of currently available algorithms to predict O-GlcNAcylated sites with a precision exceeding 9%. Our efforts to improve the precision with new features using machine learning techniques do succeed for equal proportions of O-GlcNAcylated and non-O-GlcNAcylated sites but fail like the other tools for real-life proportions where ~1.4% of S/T are O-GlcNAcylated. CONCLUSION Present-day algorithms for O-GlcNAcylation prediction narrowly outperform random prediction. The inclusion of additional features, in combination with machine learning algorithms, does not enhance these predictions, emphasizing a pressing need for further development. We hypothesize that the improvement of prediction algorithms requires characterization of OGT's partners.
Collapse
Affiliation(s)
- Theo Mauri
- Univ. Lille, CNRS; UMR8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, F-59000, France
| | | | - Muriel Bardor
- Normandy University, UNIROUEN, Laboratoire Glyco-MEV EA4358, Rouen, 76000, France
| | - Tony Lefebvre
- Univ. Lille, CNRS; UMR8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, F-59000, France
| | - Marc F Lensink
- Univ. Lille, CNRS; UMR8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, F-59000, France
| | - Guillaume Brysbaert
- Univ. Lille, CNRS; UMR8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, F-59000, France
| |
Collapse
|