51
|
Sun Z, Huang S, Zheng L, Liang P, Yang W, Zuo Y. ICTC-RAAC: An improved web predictor for identifying the types of ion channel-targeted conotoxins by using reduced amino acid cluster descriptors. Comput Biol Chem 2020; 89:107371. [PMID: 32950852 DOI: 10.1016/j.compbiolchem.2020.107371] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Conotoxins are small peptide toxins which are rich in disulfide and have the unique diversity of sequences. It is significant to correctly identify the types of ion channel-targeted conotoxins because that they are considered as the optimal pharmacological candidate medicine in drug design owing to their ability specifically binding to ion channels and interfering with neural transmission. Comparing with other feature extracting methods, the reduced amino acid cluster (RAAC) better resolved in simplifying protein complexity and identifying functional conserved regions. Thus, in our study, 673 RAACs generated from 74 types of reduced amino acid alphabet were comprehensively assessed to establish a state-of-the-art predictor for predicting ion channel-targeted conotoxins. The results showed Type 20, Cluster 9 (T = 20, C = 9) in the tripeptide composition (N = 3) achieved the best accuracy, 89.3%, which was based on the algorithm of amino acids reduction of variance maximization. Further, the ANOVA with incremental feature selection (IFS) was used for feature selection to improve prediction performance. Finally, the cross-validation results showed that the best overall accuracy we calculated was 96.4% and 1.8% higher than the best accuracy of previous studies. Based on the predictor we proposed, a user-friendly webserver was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ictcraac.
Collapse
Affiliation(s)
- Zijie Sun
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; School of Mathematical Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Wuritu Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
52
|
Li Q, Zhou W, Wang D, Wang S, Li Q. Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model. Front Bioeng Biotechnol 2020; 8:892. [PMID: 32903381 PMCID: PMC7434836 DOI: 10.3389/fbioe.2020.00892] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/10/2020] [Indexed: 01/09/2023] Open
Abstract
Cancer is still a severe health problem globally. The therapy of cancer traditionally involves the use of radiotherapy or anticancer drugs to kill cancer cells, but these methods are quite expensive and have side effects, which will cause great harm to patients. With the find of anticancer peptides (ACPs), significant progress has been achieved in the therapy of tumors. Therefore, it is invaluable to accurately identify anticancer peptides. Although biochemical experiments can solve this work, this method is expensive and time-consuming. To promote the application of anticancer peptides in cancer therapy, machine learning can be used to recognize anticancer peptides by extracting the feature vectors of anticancer peptides. Nevertheless, poor performance usually be found in training the machine learning model to utilizing high-dimensional features in practice. In order to solve the above job, this paper put forward a 19-dimensional feature model based on anticancer peptide sequences, which has lower dimensionality and better performance than some existing methods. In addition, this paper also separated a model with a low number of dimensions and acceptable performance. The few features identified in this study may represent the important features of anticancer peptides.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Sui Wang
- Key Laboratory of Soybean Biology in Chinese Ministry of Education, Northeast Agricultural University, Harbin, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
53
|
Xu Z, Shen D, Nie T, Kou Y. A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. J Biomed Inform 2020; 107:103465. [DOI: 10.1016/j.jbi.2020.103465] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 05/10/2020] [Accepted: 05/31/2020] [Indexed: 02/04/2023]
|
54
|
Zhang D, Guan ZX, Zhang ZM, Li SH, Dao FY, Tang H, Lin H. Recent Development of Computational Predicting Bioluminescent Proteins. Curr Pharm Des 2020; 25:4264-4273. [PMID: 31696804 DOI: 10.2174/1381612825666191107100758] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/22/2022]
Abstract
Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.
Collapse
Affiliation(s)
- Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
55
|
Feng P, Wang Z. Recent Advances in Computational Methods for Identifying Anticancer Peptides. Curr Drug Targets 2020; 20:481-487. [PMID: 30068270 DOI: 10.2174/1389450119666180801121548] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 05/28/2018] [Accepted: 05/28/2018] [Indexed: 01/10/2023]
Abstract
Anticancer peptide (ACP) is a kind of small peptides that can kill cancer cells without damaging normal cells. In recent years, ACP has been pre-clinically used for cancer treatment. Therefore, accurate identification of ACPs will promote their clinical applications. In contrast to labor-intensive experimental techniques, a series of computational methods have been proposed for identifying ACPs. In this review, we briefly summarized the current progress in computational identification of ACPs. The challenges and future perspectives in developing reliable methods for identification of ACPs were also discussed. We anticipate that this review could provide novel insights into future researches on anticancer peptides.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Zhenyi Wang
- Center for Genomics and Computational Biology, School of Life Science, North China University of Science and Technology, Tangshan, 063000, China
| |
Collapse
|
56
|
Wang L, Zhang R. Towards Computational Models of Identifying Protein Ubiquitination Sites. Curr Drug Targets 2020; 20:565-578. [PMID: 30246637 DOI: 10.2174/1389450119666180924150202] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 08/29/2018] [Accepted: 09/04/2018] [Indexed: 12/25/2022]
Abstract
Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.
Collapse
Affiliation(s)
- Lidong Wang
- College of Science, Dalian Maritime University, Dalian, China
| | - Ruijun Zhang
- College of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
57
|
Abstract
Background:
Protein phosphorylation is one of the most important Post-translational
Modifications (PTMs) occurring at amino acid residues serine (S), threonine (T), and tyrosine (Y).
It plays critical roles in protein structure and function predicting. With the development of novel
high-throughput sequencing technologies, there are a huge amount of protein sequences being
generated and stored in databases.
Objective:
It is of great importance in both basic research and drug development to quickly and accurately
predict which residues of S, T, or Y can be phosphorylated.
Methods:
In order to solve the problem, a novel hybrid deep learning model with a convolutional
neural network and bi-directional long short-term memory recurrent neural network
(CNN+BLSTM) is proposed for predicting phosphorylation sites in proteins. The model contains a
list of layers that transform the input data into an output class, in which the convolution layer captures
higher-level abstraction features of amino acid, while the recurrent layer captures long-term
dependencies between amino acids to improve predictions. The joint model learns interactions between
higher-level features derived from the protein sequence to predict the phosphorylated sites.
Results:
We applied our model together with two canonical methods namely iPhos-PseEn and
MusiteDeep. A 5-fold cross-validation process indicated that CNN+BLSTM outperforms the two
competitors in various evaluation metrics like the area under the receiver operating characteristic
and precision-recall curves, the Matthews correlation coefficient, F-measure, accuracy, and so on.
Conclusion:
CNN+BLSTM is promising in identifying potential protein phosphorylation for further
experimental validation.
Collapse
Affiliation(s)
- Haixia Long
- Department of Information Science and Technology, Hainan Normal University, Haikou, Hainan 571158, China
| | - Zhao Sun
- Department of Acupuncture, First Affiliated Hospital of Hainan Medical College, Hainan Medical University, Haikou, Hainan 571199, China
| | - Manzhi Li
- Department of Mathematics and Statistics, Hainan Normal University, Haikou, Hainan, 571158, China
| | - Hai Yan Fu
- Department of Information Science and Technology, Hainan Normal University, Haikou, Hainan 571158, China
| | - Ming Cai Lin
- Department of National University Science Park, Hainan Normal University, Haikou, Hainan 571158, China
| |
Collapse
|
58
|
Zuo Y, Zou Q, Lin J, Jiang M, Liu X. 2lpiRNApred: a two-layered integrated algorithm for identifying piRNAs and their functions based on LFE-GM feature selection. RNA Biol 2020; 17:892-902. [PMID: 32138598 PMCID: PMC7549647 DOI: 10.1080/15476286.2020.1734382] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/16/2019] [Accepted: 02/18/2020] [Indexed: 12/18/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are indispensable in the transposon silencing, including in germ cell formation, germline stem cell maintenance, spermatogenesis, and oogenesis. piRNA pathways are amongst the major genome defence mechanisms, which maintain genome integrity. They also have important functions in tumorigenesis, as indicated by aberrantly expressed piRNAs being recently shown to play roles in the process of cancer development. A number of computational methods for this have recently been proposed, but they still have not yielded satisfactory predictive performance. Moreover, only one computational method that identifies whether piRNAs function in inducting target mRNA deadenylation been reported in the literature. In this study, we developed a two-layered integrated classifier algorithm, 2lpiRNApred. It identifies piRNAs in the first layer and determines whether they function in inducting target mRNA deadenylation in the second layer. A new feature selection algorithm, which was based on Luca fuzzy entropy and Gaussian membership function (LFE-GM), was proposed to reduce the dimensionality of the features. Five feature extraction strategies, namely, Kmer, General parallel correlation pseudo-dinucleotide composition, General series correlation pseudo-dinucleotide composition, Normalized Moreau-Broto autocorrelation, and Geary autocorrelation, and two types of classifier, Sparse Representation Classifier (SRC) and support vector machine with Mahalanobis distance-based radial basis function (SVMMDRBF), were used to construct a two-layered integrated classifier algorithm, 2lpiRNApred. The results indicate that 2lpiRNApred performs significantly better than six other existing prediction tools.
Collapse
Affiliation(s)
- Yun Zuo
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| | - Jianyuan Lin
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Min Jiang
- Department of Cognitive Science and Technology, Xiamen University, Xiamen, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen, China
| |
Collapse
|
59
|
Ju Z, Wang SY. Prediction of 2-hydroxyisobutyrylation sites by integrating multiple sequence features with ensemble support vector machine. Comput Biol Chem 2020; 87:107280. [PMID: 32505881 DOI: 10.1016/j.compbiolchem.2020.107280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 10/24/2022]
Abstract
Lysine 2-hydroxyisobutyrylation (Khib) is a new type of histone mark, which has been found to affect the association between histone and DNA. To better understand the molecular mechanism of Khib, it is important to identify 2-hydroxyisobutyrylated substrates and their corresponding Khib sites accurately. In this study, a novel bioinformatics tool named KhibPred is proposed to predict Khib sites in human HeLa cells. Three kinds of effective features, the composition of k-spaced amino acid pairs, binary encoding and amino acid factors, are incorporated to encode Khib sites. Moreover, an ensemble support vector machine is employed to overcome the imbalanced problem in the prediction. As illustrated by 10-fold cross-validation, the performance of KhibPred achieves a satisfactory performance with an area under receiver operating characteristic curve of 0.7937. Therefore, KhibPred can be a useful tool for predicting protein Khib sites. Feature analysis shows that the polarity factor features play significant roles in the prediction of Khib sites. The conclusions derived from this study might provide useful insights for in-depth investigation into the molecular mechanisms of Khib.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| |
Collapse
|
60
|
A Novel Triple Matrix Factorization Method for Detecting Drug-Side Effect Association Based on Kernel Target Alignment. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4675395. [PMID: 32596314 PMCID: PMC7275954 DOI: 10.1155/2020/4675395] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Accepted: 04/08/2020] [Indexed: 01/01/2023]
Abstract
All drugs usually have side effects, which endanger the health of patients. To identify potential side effects of drugs, biological and pharmacological experiments are done but are expensive and time-consuming. So, computation-based methods have been developed to accurately and quickly predict side effects. To predict potential associations between drugs and side effects, we propose a novel method called the Triple Matrix Factorization- (TMF-) based model. TMF is built by the biprojection matrix and latent feature of kernels, which is based on Low Rank Approximation (LRA). LRA could construct a lower rank matrix to approximate the original matrix, which not only retains the characteristics of the original matrix but also reduces the storage space and computational complexity of the data. To fuse multivariate information, multiple kernel matrices are constructed and integrated via Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) in drug and side effect space, respectively. Compared with other methods, our model achieves better performance on three benchmark datasets. The values of the Area Under the Precision-Recall curve (AUPR) are 0.677, 0.685, and 0.680 on three datasets, respectively.
Collapse
|
61
|
Li F, Fan C, Marquez-Lago TT, Leier A, Revote J, Jia C, Zhu Y, Smith AI, Webb GI, Liu Q, Wei L, Li J, Song J. PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact. Brief Bioinform 2020; 21:1069-1079. [PMID: 31161204 PMCID: PMC7299293 DOI: 10.1093/bib/bbz050] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 03/26/2019] [Accepted: 03/29/2019] [Indexed: 12/26/2022] Open
Abstract
Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Dalian, China
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Yan Zhu
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jian Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
62
|
Wei H, Xu Y, Liu B. iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on positive unlabeled learning. Brief Bioinform 2020; 22:5829704. [PMID: 32393982 DOI: 10.1093/bib/bbaa058] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/15/2020] [Accepted: 03/24/2020] [Indexed: 12/20/2022] Open
Abstract
Accumulated researches have revealed that Piwi-interacting RNAs (piRNAs) are regulating the development of germ and stem cells, and they are closely associated with the progression of many diseases. As the number of the detected piRNAs is increasing rapidly, it is important to computationally identify new piRNA-disease associations with low cost and provide candidate piRNA targets for disease treatment. However, it is a challenging problem to learn effective association patterns from the positive piRNA-disease associations and the large amount of unknown piRNA-disease pairs. In this study, we proposed a computational predictor called iPiDi-PUL to identify the piRNA-disease associations. iPiDi-PUL extracted the features of piRNA-disease associations from three biological data sources, including piRNA sequence information, disease semantic terms and the available piRNA-disease association network. Principal component analysis (PCA) was then performed on these features to extract the key features. The training datasets were constructed based on known positive associations and the negative associations selected from the unknown pairs. Various random forest classifiers trained with these different training sets were merged to give the predictive results via an ensemble learning approach. Finally, the web server of iPiDi-PUL was established at http://bliulab.net/iPiDi-PUL to help the researchers to explore the associated diseases for newly discovered piRNAs.
Collapse
|
63
|
Wang S, Cao Z, Li M, Yue Y. G-DipC: An Improved Feature Representation Method for Short Sequences to Predict the Type of Cargo in Cell-Penetrating Peptides. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:739-747. [PMID: 31352350 DOI: 10.1109/tcbb.2019.2930993] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Cell-penetrating peptides (CPPs) are functional short peptides with high carrying capacity. CPP sequences with targeting functions for the highly efficient delivery of drugs to target cells. In this paper, which is focused on the prediction of the cargo category of CPPs, a biocomputational model is constructed to efficiently distinguish the category of cargo carried by CPPs as macromolecular carriers among the seven known deliverable cargo categories. Based on dipeptide composition (DipC), an improved feature representation method, general dipeptide composition (G-DipC) is proposed for short peptide sequences and can effectively increase the abundance of features represented. Then linear discriminant analysis (LDA) is applied to mine some important low-dimensional features of G-DipC and a predictive model is built with the XGBoost algorithm. Experimental results with five-fold cross validation show that G-DipC improves accuracy by 25 and 5 percent compared with amino acid composition (AAC) and DipC, respectively. G-DipC is even found to be better than tripeptide composition (TipC). Thus, the proposed model provides a novel resource for the study of cell-penetrating peptides, and the improved dipeptide composition G-DipC can be widely adapted to determine the feature representation of other biological sequences.
Collapse
|
64
|
Hou R, Wang L, Wu YJ. Predicting ATP-Binding Cassette Transporters Using the Random Forest Method. Front Genet 2020; 11:156. [PMID: 32269586 PMCID: PMC7109328 DOI: 10.3389/fgene.2020.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open
Abstract
ATP-binding cassette (ABC) proteins play important roles in a wide variety of species. These proteins are involved in absorbing nutrients, exporting toxic substances, and regulating potassium channels, and they contribute to drug resistance in cancer cells. Therefore, the identification of ABC transporters is an urgent task. The present study used 188D as the feature extraction method, which is based on sequence information and physicochemical properties. We also visualized the feature extracted by t-Distributed Stochastic Neighbor Embedding (t-SNE). The sample based on the features extracted by 188D may be separated. Further, random forest (RF) is an efficient classifier to identify proteins. Under the 10-fold cross-validation of the model proposed here for a training set, the average accuracy rate of 10 training sets was 89.54%. We obtained values of 0.87 for specificity, 0.92 for sensitivity, and 0.79 for MCC. In the testing set, the accuracy achieved was 89%. These results suggest that the model combining 188D with RF is an optimal tool to identify ABC transporters.
Collapse
Affiliation(s)
- Ruiyan Hou
- Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Lida Wang
- Department of Scientific Research, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yi-Jun Wu
- Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
65
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
66
|
iDHS-DSAMS: Identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree. Genomics 2020; 112:1282-1289. [DOI: 10.1016/j.ygeno.2019.07.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 07/14/2019] [Accepted: 07/30/2019] [Indexed: 11/21/2022]
|
67
|
Ru X, Wang L, Li L, Ding H, Ye X, Zou Q. Exploration of the correlation between GPCRs and drugs based on a learning to rank algorithm. Comput Biol Med 2020; 119:103660. [PMID: 32090901 DOI: 10.1016/j.compbiomed.2020.103660] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/04/2020] [Accepted: 02/12/2020] [Indexed: 02/01/2023]
Abstract
Exploring the protein - drug correlation can not only solve the problem of selecting candidate compounds but also solve related problems such as drug redirection and finding potential drug targets. Therefore, many researchers have proposed different machine learning methods for prediction of protein-drug correlations. However, many existing models simply divide the protein-drug relationship into related or irrelevant categories and do not deeply explore the most relevant target (or drug) for a given drug (or target). In order to solve this problem, this paper applies the ranking concept to the prediction of the GPCR (G Protein-Coupled Receptors)-drug correlation. This study uses two different types of data sets to explore candidate compound and potential target problems, and both sets achieved good results. In addition, this study also found that the family to which a protein belongs is not an inherent factor that affects the ranking of GPCR-drug correlations; however, if the drug affects other family members of the protein, then the protein is likely to be a potential target of the drug. This study showed that the learning to rank algorithm is a good tool for exploring protein-drug correlations.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lida Wang
- Scientific Research Department, Heilongjiang Agricultural Recalmation General Hospital, Harbin, China.
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba Science City, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
68
|
Zhao X, Jiao Q, Li H, Wu Y, Wang H, Huang S, Wang G. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles. BMC Bioinformatics 2020; 21:43. [PMID: 32024464 PMCID: PMC7003361 DOI: 10.1186/s12859-020-3388-y] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 01/27/2020] [Indexed: 11/27/2022] Open
Abstract
Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples. Results Based on the thought of using an ensemble classifier, we develop a feature selection tool for differential expression analysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature. Conclusions Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at http://bio-nefu.com/resource/ecfs-dea.
Collapse
Affiliation(s)
- Xudong Zhao
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Qing Jiao
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Hangyu Li
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Yiming Wu
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Hanxu Wang
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Shan Huang
- Department of Neurology, The 2nd Affiliated Hospital of Harbin Medical University, No. 246 Xuefu Road, Harbin, 150086, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China. .,State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China.
| |
Collapse
|
69
|
Li F, Leier A, Liu Q, Wang Y, Xiang D, Akutsu T, Webb GI, Smith AI, Marquez-Lago T, Li J, Song J. Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:52-64. [PMID: 32413515 PMCID: PMC7393547 DOI: 10.1016/j.gpb.2019.08.002] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/08/2019] [Accepted: 10/23/2019] [Indexed: 10/29/2022]
Abstract
Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes. Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method, which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field (CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases. Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Andre Leier
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatiana Marquez-Lago
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA.
| | - Jian Li
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
| |
Collapse
|
70
|
Zhao T, Wang D, Hu Y, Zhang N, Zang T, Wang Y. Identifying Alzheimer’s Disease-related miRNA Based on Semi-clustering. Curr Gene Ther 2019; 19:216-223. [DOI: 10.2174/1566523219666190924113737] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/05/2019] [Accepted: 06/12/2019] [Indexed: 01/14/2023]
Abstract
Background:
More and more scholars are trying to use it as a specific biomarker for Alzheimer’s
Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that
miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early
events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of
AD, and may also be involved in the disease through some specific molecular mechanisms.
Objective:
Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early
diagnosis.
Materials and Methods:
We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein
interaction network is used to find more AD-related genes by known AD-related genes. Then,
each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each
miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not
generate negative samples randomly with using classification method to identify AD-related miRNAs.
Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers
and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers).
Results and Conclusion:
We identified 257 novel AD-related miRNAs and compare our method with
SVM which is applied by generating negative samples. The AUC of our method is much higher than
SVM and we did case studies to prove that our results are reliable.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yang Hu
- School of life Science and Tenchnology, Harbin Institute of Technology, Harbin, China
| | - Ningyi Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
71
|
Manguy J, Shields DC. Implications of kappa-casein evolutionary diversity for the self-assembly and aggregation of casein micelles. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190939. [PMID: 31824707 PMCID: PMC6837221 DOI: 10.1098/rsos.190939] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 09/24/2019] [Indexed: 06/10/2023]
Abstract
Milk alpha-, beta- and kappa-casein proteins assemble into casein micelles in breast epithelial cells. The glycomacropeptide (GMP) tails of kappa-casein that extend from the surface of the micelle are key to assembly and aggregation. Aggregation is triggered by stomach pepsin cleavage of GMP from para-kappa-casein (PKC). While one casein micelle model emphasizes the importance of hydrophobic interactions, another focuses on polar residues. We performed an evolutionary analysis of kappa-casein primary sequence and predicted features that potentially impact on protein interactions. We noted more rapid change in the earlier period (166 to 60 Ma). Pepsin and plasmin cleavage sites were avoided in the GMP, which may partly explain its amino acid composition. Short tandem repeats have led to modest expansions of PKC, and to large GMP expansions, suggesting the GMP is less length constrained. Amino acid compositional constraints were assessed across species. Polarity and hydrophobicity properties were insufficient to explain differences between PKC and GMP. Among polar residues, threonine dominates the GMP, compared to serine, probably reflecting its preference for O-glycosylation over phosphorylation. Glutamine, enriched in the bovine PQ-rich region, is not positionally conserved in other species. Among hydrophobic residues, isoleucine is clearly preferred over leucine in the GMP, and patches of hydrophobicity are not markedly positionally conserved. PKC tyrosine and charged residues showed stronger conservation of position, suggesting a role for pi-interactions, seen in other structurally dynamic protein membraneless assemblies. Independent acquisitions of cysteines are consistent with a trend of increasing stabilization of multimers by covalent disulphide bonds, over evolutionary time. In conclusion, kappa-casein compositional and positional constraints appear to be influenced by modification preferences, protease evasion and protein-protein interactions.
Collapse
Affiliation(s)
- Jean Manguy
- UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
- Food for Health Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Denis C. Shields
- UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
- Food for Health Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
72
|
FKRR-MVSF: A Fuzzy Kernel Ridge Regression Model for Identifying DNA-Binding Proteins by Multi-View Sequence Features via Chou's Five-Step Rule. Int J Mol Sci 2019; 20:ijms20174175. [PMID: 31454964 PMCID: PMC6747228 DOI: 10.3390/ijms20174175] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 08/10/2019] [Accepted: 08/19/2019] [Indexed: 12/22/2022] Open
Abstract
DNA-binding proteins play an important role in cell metabolism. In biological laboratories, the detection methods of DNA-binding proteins includes yeast one-hybrid methods, bacterial singles and X-ray crystallography methods and others, but these methods involve a lot of labor, material and time. In recent years, many computation-based approachs have been proposed to detect DNA-binding proteins. In this paper, a machine learning-based method, which is called the Fuzzy Kernel Ridge Regression model based on Multi-View Sequence Features (FKRR-MVSF), is proposed to identifying DNA-binding proteins. First of all, multi-view sequence features are extracted from protein sequences. Next, a Multiple Kernel Learning (MKL) algorithm is employed to combine multiple features. Finally, a Fuzzy Kernel Ridge Regression (FKRR) model is built to detect DNA-binding proteins. Compared with other methods, our model achieves good results. Our method obtains an accuracy of 83.26% and 81.72% on two benchmark datasets (PDB1075 and compared with PDB186), respectively.
Collapse
|
73
|
A Novel on Transmission Line Tower Big Data Analysis Model Using Altered K-means and ADQL. SUSTAINABILITY 2019. [DOI: 10.3390/su11133499] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study sought to propose a big data analysis and prediction model for transmission line tower outliers to assess when something is wrong with transmission line tower big data based on deep reinforcement learning. The model enables choosing automatic cluster K values based on non-labeled sensor big data. It also allows measuring the distance of action between data inside a cluster with the Q-value representing network output in the altered transmission line tower big data clustering algorithm containing transmission line tower outliers and old Deep Q Network. Specifically, this study performed principal component analysis to categorize transmission line tower data and proposed an automatic initial central point approach through standard normal distribution. It also proposed the A-Deep Q-Learning algorithm altered from the deep Q-Learning algorithm to explore policies based on the experiences of clustered data learning. It can be used to perform transmission line tower outlier data learning based on the distance of data within a cluster. The performance evaluation results show that the proposed model recorded an approximately 2.29%~4.19% higher prediction rate and around 0.8% ~ 4.3% higher accuracy rate compared to the old transmission line tower big data analysis model.
Collapse
|
74
|
Lv H, Zhang ZM, Li SH, Tan JX, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform 2019; 21:982-995. [DOI: 10.1093/bib/bbz048] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 03/25/2019] [Accepted: 04/01/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.
Collapse
Affiliation(s)
- Hao Lv
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zi-Mei Zhang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shi-Hao Li
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiu-Xin Tan
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
75
|
Chen W, Song X, Lin H. Combinatorial Pattern of Histone Modifications in Exon Skipping Event. Front Genet 2019; 10:122. [PMID: 30833963 PMCID: PMC6387913 DOI: 10.3389/fgene.2019.00122] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Histone modifications are associated with alternative splicing. It has been suggested that histone modifications act in combinational patterns in gene expression regulation. However, how they interact with each other and what is their casual relationships in the process of RNA splicing remain unclear. In this study, the combinatorial patterns of 38 kinds of histone modifications in the exon skipping event of the CD4+ T cell were analyzed by constructing Bayesian networks. Distinct combinatorial patterns of histone modifications that illustrating their casual relationships were observed in excluded/included exons and the surrounding intronic regions. The Bayesian networks also indicate that some histone modifications directly correlate with RNA splicing. We anticipate that this work could provide novel insights into the effects of histone modifications on RNA splicing regulation.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan, China.,Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoming Song
- Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Sciences and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
76
|
Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. RNA (NEW YORK, N.Y.) 2019; 25:205-218. [PMID: 30425123 PMCID: PMC6348985 DOI: 10.1261/rna.069112.118] [Citation(s) in RCA: 338] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/01/2018] [Indexed: 05/20/2023]
Abstract
N6-Methyladenosine (m6A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N6-methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m6A sites detected by high-throughput sequencing, it is now possible to discover the characteristics of m6A sequences using deep learning techniques. To the best of our knowledge, our work is the first attempt to use word embedding and deep neural networks for m6A prediction from mRNA sequences. Using four deep neural networks, we developed a model inferred from a larger sequence shifting window that can predict m6A accurately and robustly. Four prediction schemes were built with various RNA sequence representations and optimized convolutional neural networks. The soft voting results from the four deep networks were shown to outperform all of the state-of-the-art methods. We evaluated these predictors mentioned above on a rigorous independent test data set and proved that our proposed method outperforms the state-of-the-art predictors. The training, independent, and cross-species testing data sets are much larger than in previous studies, which could help to avoid the problem of overfitting. Furthermore, an online prediction web server implementing the four proposed predictors has been built and is available at http://server.malab.cn/Gene2vec/.
Collapse
Affiliation(s)
- Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610051 Chengdu, China
- School of Computer Science and Technology, Tianjin University, 300350 Tianjin, China
| | - Pengwei Xing
- School of Computer Science and Technology, Tianjin University, 300350 Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, 300350 Tianjin, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, 150001 Shenzhen, China
| |
Collapse
|
77
|
Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae. Molecules 2019; 24:molecules24030380. [PMID: 30678171 PMCID: PMC6384727 DOI: 10.3390/molecules24030380] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 12/17/2018] [Accepted: 12/17/2018] [Indexed: 11/16/2022] Open
Abstract
As an abundant post-transcriptional modification, dihydrouridine (D) has been found in transfer RNA (tRNA) from bacteria, eukaryotes, and archaea. Nonetheless, knowledge of the exact biochemical roles of dihydrouridine in mediating tRNA function is still limited. Accurate identification of the position of D sites is essential for understanding their functions. Therefore, it is desirable to develop novel methods to identify D sites. In this study, an ensemble classifier was proposed for the detection of D modification sites in the Saccharomyces cerevisiae transcriptome by using heterogeneous features. The jackknife test results demonstrate that the proposed predictor is promising for the identification of D modification sites. It is anticipated that the proposed method can be widely used for identifying D modification sites in tRNA.
Collapse
|
78
|
Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment. Molecules 2019; 24:molecules24010179. [PMID: 30621295 PMCID: PMC6337464 DOI: 10.3390/molecules24010179] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 12/29/2018] [Accepted: 01/01/2019] [Indexed: 11/16/2022] Open
Abstract
Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics.
Collapse
|
79
|
Wang L, Zhang R, Mu Y. Fu-SulfPred: Identification of Protein S-sulfenylation Sites by Fusing Forests via Chou’s General PseAAC. J Theor Biol 2019; 461:51-58. [DOI: 10.1016/j.jtbi.2018.10.046] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Revised: 10/14/2018] [Accepted: 10/22/2018] [Indexed: 10/28/2022]
|
80
|
Zhang S, Lin J, Su L, Zhou Z. pDHS-DSET: Prediction of DNase I hypersensitive sites in plant genome using DS evidence theory. Anal Biochem 2019; 564-565:54-63. [DOI: 10.1016/j.ab.2018.10.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 10/10/2018] [Accepted: 10/15/2018] [Indexed: 10/28/2022]
|
81
|
Cao M, Chen G, Yu J, Shi S. Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy. Brief Bioinform 2018; 21:595-608. [DOI: 10.1093/bib/bby122] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 11/16/2018] [Accepted: 11/22/2018] [Indexed: 11/12/2022] Open
Abstract
Abstract
Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.
Collapse
Affiliation(s)
- Man Cao
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Guodong Chen
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
82
|
Yu J, Shi S, Zhang F, Chen G, Cao M. PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics 2018; 35:2749-2756. [DOI: 10.1093/bioinformatics/bty1043] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/13/2018] [Accepted: 12/20/2018] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive.
Results
By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation.
Availability and implementation
https://github.com/yujialinncu/PredGly
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Fang Zhang
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Guodong Chen
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Man Cao
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
83
|
He W, Wei L, Zou Q. Research progress in protein posttranslational modification site prediction. Brief Funct Genomics 2018; 18:220-229. [DOI: 10.1093/bfgp/ely039] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 11/15/2018] [Accepted: 11/22/2018] [Indexed: 01/24/2023] Open
Abstract
AbstractPosttranslational modifications (PTMs) play an important role in regulating protein folding, activity and function and are involved in almost all cellular processes. Identification of PTMs of proteins is the basis for elucidating the mechanisms of cell biology and disease treatments. Compared with the laboriousness of equivalent experimental work, PTM prediction using various machine-learning methods can provide accurate, simple and rapid research solutions and generate valuable information for further laboratory studies. In this review, we manually curate most of the bioinformatics tools published since 2008. We also summarize the approaches for predicting ubiquitination sites and glycosylation sites. Moreover, we discuss the challenges of current PTM bioinformatics tools and look forward to future research possibilities.
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
84
|
Wang E, Zhao H, Zhao D, Li L, Du L. Functional Prediction of Chronic Kidney Disease Susceptibility Gene PRKAG2 by Comprehensively Bioinformatics Analysis. Front Genet 2018; 9:573. [PMID: 30559760 PMCID: PMC6287114 DOI: 10.3389/fgene.2018.00573] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/08/2018] [Indexed: 02/01/2023] Open
Abstract
The genetic predisposition to chronic kidney disease (CKD) has been widely evaluated especially using the genome-wide association studies, which highlighted some novel genetic susceptibility variants in many genes, and estimated glomerular filtration rate to diagnose and stage CKD. Of these variants, rs7805747 in PRKAG2 was identified to be significantly associated with both serum creatinine and CKD with genome wide significance level. Until now, the potential mechanism by which rs7805747 affects CKD risk is still unclear. Here, we performed a functional analysis of rs7805747 variant using multiple bioinformatics software and databases. Using RegulomeDB and HaploReg (version 4.1), rs7805747 was predicated to locate in enhancer histone marks (Liver, Duodenum Mucosa, Fetal Intestine Large, Fetal Intestine Small, and Right Ventricle tissues). Using GWAS analysis in PhenoScanner, we showed that rs7805747 is not only associated with CKD, but also is significantly associated with other diseases or phenotypes. Using metabolite analysis in PhenoScanner, rs7805747 is identified to be significantly associated with not only the serum creatinine, but also with other 16 metabolites. Using eQTL analysis in PhenoScanner, rs7805747 is identified to be significantly associated with gene expression in multiple human tissues and multiple genes including PRKAG2. The gene expression analysis of PRKAG2 using 53 tissues from GTEx RNA-Seq of 8555 samples (570 donors) in GTEx showed that PRKAG2 had the highest median expression in Heart-Atrial Appendage. Using the gene expression profiles in human CKD, we further identified different expression of PRKAG2 gene in CKD cases compared with control samples. In summary, our findings provide new insight into the underlying susceptibility of PRKAG2 gene to CKD.
Collapse
Affiliation(s)
- Ermin Wang
- Department of Nephrology, The First Affiliated Hospital, Jinzhou Medical University, Jinzhou, China
| | - Hainan Zhao
- Department of Nephrology, The First Affiliated Hospital, Jinzhou Medical University, Jinzhou, China
| | - Deyan Zhao
- Department of Nephrology, The First Affiliated Hospital, Jinzhou Medical University, Jinzhou, China
| | - Lijing Li
- Department of Nephrology, The First Affiliated Hospital, Jinzhou Medical University, Jinzhou, China
| | - Limin Du
- Jinzhou Medical University, Jinzhou, China
| |
Collapse
|
85
|
Li T, Chen Y, Li T, Jia C. Recognition of Protein Pupylation Sites by Adopting Resampling Approach. Molecules 2018; 23:molecules23123097. [PMID: 30486421 PMCID: PMC6321382 DOI: 10.3390/molecules23123097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Revised: 11/21/2018] [Accepted: 11/22/2018] [Indexed: 12/28/2022] Open
Abstract
With the in-depth study of posttranslational modification sites, protein ubiquitination has become the key problem to study the molecular mechanism of posttranslational modification. Pupylation is a widely used process in which a prokaryotic ubiquitin-like protein (Pup) is attached to a substrate through a series of biochemical reactions. However, the experimental methods of identifying pupylation sites is often time-consuming and laborious. This study aims to propose an improved approach for predicting pupylation sites. Firstly, the Pearson correlation coefficient was used to reflect the correlation among different amino acid pairs calculated by the frequency of each amino acid. Then according to a descending ranked order, the multiple types of features were filtered separately by values of Pearson correlation coefficient. Thirdly, to get a qualified balanced dataset, the K-means principal component analysis (KPCA) oversampling technique was employed to synthesize new positive samples and Fuzzy undersampling method was employed to reduce the number of negative samples. Finally, the performance of our method was verified by means of jackknife and a 10-fold cross-validation test. The average results of 10-fold cross-validation showed that the sensitivity (Sn) was 90.53%, specificity (Sp) was 99.8%, accuracy (Acc) was 95.09%, and Matthews Correlation Coefficient (MCC) was 0.91. Moreover, an independent test dataset was used to further measure its performance, and the prediction results achieved the Acc of 83.75%, MCC of 0.49, which was superior to previous predictors. The better performance and stability of our proposed method showed it is an effective way to predict pupylation sites.
Collapse
Affiliation(s)
- Tao Li
- School of Transportation Management, Dalian Maritime University, Dalian 116026, China.
- China Waterborne Transport Research Institute, Beijing 100088, China.
| | - Yan Chen
- School of Transportation Management, Dalian Maritime University, Dalian 116026, China.
| | - Taoying Li
- School of Transportation Management, Dalian Maritime University, Dalian 116026, China.
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Dalian 116026, China.
| |
Collapse
|
86
|
Prediction of GluN2B-CT 1290-1310/DAPK1 Interaction by Protein⁻Peptide Docking and Molecular Dynamics Simulation. Molecules 2018; 23:molecules23113018. [PMID: 30463177 PMCID: PMC6278559 DOI: 10.3390/molecules23113018] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 11/04/2018] [Accepted: 11/06/2018] [Indexed: 02/08/2023] Open
Abstract
The interaction of death-associated protein kinase 1 (DAPK1) with the 2B subunit (GluN2B) C-terminus of N-methyl-D-aspartate receptor (NMDAR) plays a critical role in the pathophysiology of depression and is considered a potential target for the structure-based discovery of new antidepressants. However, the 3D structures of C-terminus residues 1290⁻1310 of GluN2B (GluN2B-CT1290-1310) remain elusive and the interaction between GluN2B-CT1290-1310 and DAPK1 is unknown. In this study, the mechanism of interaction between DAPK1 and GluN2B-CT1290-1310 was predicted by computational simulation methods including protein⁻peptide docking and molecular dynamics (MD) simulation. Based on the equilibrated MD trajectory, the total binding free energy between GluN2B-CT1290-1310 and DAPK1 was computed by the mechanics generalized born surface area (MM/GBSA) approach. The simulation results showed that hydrophobic, van der Waals, and electrostatic interactions are responsible for the binding of GluN2B-CT1290⁻1310/DAPK1. Moreover, through per-residue free energy decomposition and in silico alanine scanning analysis, hotspot residues between GluN2B-CT1290-1310 and DAPK1 interface were identified. In conclusion, this work predicted the binding mode and quantitatively characterized the protein⁻peptide interface, which will aid in the discovery of novel drugs targeting the GluN2B-CT1290-1310 and DAPK1 interface.
Collapse
|
87
|
Support Vector Machine Classifier for Accurate Identification of piRNA. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8112204] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Piwi-interacting RNA (piRNA) is a newly identified class of small non-coding RNAs. It can combine with PIWI proteins to regulate the transcriptional gene silencing process, heterochromatin modifications, and to maintain germline and stem cell function in animals. To better understand the function of piRNA, it is imperative to improve the accuracy of identifying piRNAs. In this study, the sequence information included the single nucleotide composition, and 16 dinucleotides compositions, six physicochemical properties in RNA, the position specificities of nucleotides both in N-terminal and C-terminal, and the proportions of the similar peptide sequence of both N-terminal and C-terminal in positive and negative samples, which were used to construct the feature vector. Then, the F-Score was applied to choose an optimal single type of features. By combining these selected features, we achieved the best results on the jackknife and the 5-fold cross-validation running 10 times based on the support vector machine algorithm. Moreover, we further evaluated the stability and robustness of our new method.
Collapse
|
88
|
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front Genet 2018; 9:515. [PMID: 30459809 PMCID: PMC6232260 DOI: 10.3389/fgene.2018.00515] [Citation(s) in RCA: 223] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 10/12/2018] [Indexed: 12/30/2022] Open
Abstract
Diabetes mellitus is a chronic disease characterized by hyperglycemia. It may cause many complications. According to the growing morbidity in recent years, in 2040, the world’s diabetic patients will reach 642 million, which means that one of the ten adults in the future is suffering from diabetes. There is no doubt that this alarming figure needs great attention. With the rapid development of machine learning, machine learning has been applied to many aspects of medical health. In this study, we used decision tree, random forest and neural network to predict diabetes mellitus. The dataset is the hospital physical examination data in Luzhou, China. It contains 14 attributes. In this study, five-fold cross validation was used to examine the models. In order to verity the universal applicability of the methods, we chose some methods that have the better performance to conduct independent test experiments. We randomly selected 68994 healthy people and diabetic patients’ data, respectively as training set. Due to the data unbalance, we randomly extracted 5 times data. And the result is the average of these five experiments. In this study, we used principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) to reduce the dimensionality. The results showed that prediction with random forest could reach the highest accuracy (ACC = 0.8084) when all the attributes were used.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Kaiyang Qu
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Yamei Luo
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Dehui Yin
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Hua Tang
- Department of Pathophysiology, School of Basic Medicine, Southwest Medical University, Luzhou, China
| |
Collapse
|
89
|
Xiong Y, Wang Q, Yang J, Zhu X, Wei DQ. PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method. Front Microbiol 2018; 9:2571. [PMID: 30416498 PMCID: PMC6212463 DOI: 10.3389/fmicb.2018.02571] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Accepted: 10/09/2018] [Indexed: 11/13/2022] Open
Abstract
Gram-negative bacteria use various secretion systems to deliver their secreted effectors. Among them, type IV secretion system exists widely in a variety of bacterial species, and secretes type IV secreted effectors (T4SEs), which play vital roles in host-pathogen interactions. However, experimental approaches to identify T4SEs are time- and resource-consuming. In the present study, we aim to develop an in silico stacked ensemble method to predict whether a protein is an effector of type IV secretion system or not based on its sequence information. The protein sequences were encoded by the feature of position specific scoring matrix (PSSM)-composition by summing rows that correspond to the same amino acid residues in PSSM profiles. Based on the PSSM-composition features, we develop a stacked ensemble model PredT4SE-Stack to predict T4SEs, which utilized an ensemble of base-classifiers implemented by various machine learning algorithms, such as support vector machine, gradient boosting machine, and extremely randomized trees, to generate outputs for the meta-classifier in the classification system. Our results demonstrated that the framework of PredT4SE-Stack was a feasible and effective way to accurately identify T4SEs based on protein sequence information. The datasets and source code of PredT4SE-Stack are freely available at http://xbioinfo.sjtu.edu.cn/PredT4SE_Stack/index.php.
Collapse
Affiliation(s)
- Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qiankun Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Junchen Yang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
90
|
Machine Learning Approaches for Protein⁻Protein Interaction Hot Spot Prediction: Progress and Comparative Assessment. Molecules 2018; 23:molecules23102535. [PMID: 30287797 PMCID: PMC6222875 DOI: 10.3390/molecules23102535] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 09/27/2018] [Accepted: 10/02/2018] [Indexed: 12/27/2022] Open
Abstract
Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical for understanding the principles of protein interactions, and it has broad application prospects in protein design and drug development. Experimental methods like alanine scanning mutagenesis are labor-intensive and time-consuming. At present, the experimentally measured hot spots are very limited. Hence, the use of computational approaches to predicting hot spots is becoming increasingly important. Here, we describe the basic concepts and recent advances of machine learning applications in inferring the protein–protein interaction hot spots, and assess the performance of widely used features, machine learning algorithms, and existing state-of-the-art approaches. We also discuss the challenges and future directions in the prediction of hot spots.
Collapse
|
91
|
Chen W, Feng P, Ding H, Lin H. Classifying Included and Excluded Exons in Exon Skipping Event Using Histone Modifications. Front Genet 2018; 9:433. [PMID: 30327665 PMCID: PMC6174203 DOI: 10.3389/fgene.2018.00433] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 09/12/2018] [Indexed: 12/15/2022] Open
Abstract
Alternative splicing (AS) not only ensures the diversity of gene expression products, but also closely correlated with genetic diseases. Therefore, knowledge about regulatory mechanisms of AS will provide useful clues for understanding its biological functions. In the current study, a random forest based method was developed to classify included and excluded exons in exon skipping event. In this method, the samples in the dataset were encoded by using optimal histone modification features which were optimized by using the Maximum Relevance Maximum Distance (MRMD) feature selection technique. The proposed method obtained an accuracy of 72.91% in 10-fold cross validation test and outperformed existing methods. Meanwhile, we also systematically analyzed the distribution of histone modifications between included and excluded exons and discovered their preference in both kinds of exons, which might provide insights into researches on the regulatory mechanisms of alternative splicing.
Collapse
Affiliation(s)
- Wei Chen
- Center for Genomics and Computational Biology, School of Life Science, North China University of Science and Technology, Tangshan, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
92
|
Prediction Methods of Herbal Compounds in Chinese Medicinal Herbs. Molecules 2018; 23:molecules23092303. [PMID: 30201875 PMCID: PMC6225236 DOI: 10.3390/molecules23092303] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 09/06/2018] [Accepted: 09/07/2018] [Indexed: 12/12/2022] Open
Abstract
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese herbal medicine compounds with modern medicine? Chinese herbal medicine drug-like prediction method is particularly important. A growing number of Chinese herbal source compounds are now widely used as drug-like compound candidates. An important way for pharmaceutical companies to develop drugs is to discover potentially active compounds from related herbs in Chinese herbs. The methods for predicting the drug-like properties of Chinese herbal compounds include the virtual screening method, pharmacophore model method and machine learning method. In this paper, we focus on the prediction methods for the medicinal properties of Chinese herbal medicines. We analyze the advantages and disadvantages of the above three methods, and then introduce the specific steps of the virtual screening method. Finally, we present the prospect of the joint application of various methods.
Collapse
|
93
|
Małysiak-Mrozek B. Uncertainty, imprecision, and many-valued logics in protein bioinformatics. Math Biosci 2018; 309:143-162. [PMID: 30118719 DOI: 10.1016/j.mbs.2018.08.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/24/2018] [Accepted: 08/09/2018] [Indexed: 11/15/2022]
Abstract
Understanding proteins, their structures, functions, mutual interactions, activity in cellular reactions, interactions with drugs, and expression in body cells is a key to efficient medical diagnosis, drug production, and treatment of patients. Machine learning and data exploration methods supported by many-valued logics allow to grasp the imprecision and uncertainties that naturally occur in proteins and other biomolecules. Many-valued logics, like Łukasiewicz logic or fuzzy logic, are non-classical logics that do not restrict the number of truth values to only two values of true or false, but they allow for a larger set of truth degrees. In this paper, we briefly review the use of many-valued logics, especially the fuzzy logic, in bioinformatics. Then, we focus on protein bioinformatics, and present selected applications of many-valued logics in the analysis of complex protein structures, including; (1) potential-based protein similarity searching, (2) matching proteins on the basis of secondary structures, (3) 3D protein structure alignment, (4) prediction of intrinsically disordered proteins, and (5) fuzzy querying in large collections of Big macromolecular Data. Results of presented studies show that the utilization of many-valued logics can enrich the investigations of protein molecules, in which uncertainty and imprecision are prevalent problems. The paper discusses all observed benefits brought by the application of many-valued logics in investigations related to selected protein analyzes carried out by the author.
Collapse
Affiliation(s)
- Bożena Małysiak-Mrozek
- Institute of Informatics, Silesian University of Technology, Akademicka 16, Gliwice 44-100, Poland.
| |
Collapse
|
94
|
Yang H, Lv H, Ding H, Chen W, Lin H. iRNA-2OM: A Sequence-Based Predictor for Identifying 2'-O-Methylation Sites in Homo sapiens. J Comput Biol 2018; 25:1266-1277. [PMID: 30113871 DOI: 10.1089/cmb.2018.0004] [Citation(s) in RCA: 119] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
2'-O-methylation plays an important biological role in gene expression. Owing to the explosive increase in genomic sequencing data, it is necessary to develop a method for quickly and efficiently identifying whether a sequence contains the 2'-O-methylation site. As an additional method to the experimental technique, a computational method may help to identify 2'-O-methylation sites. In this study, based on the experimental 2'-O-methylation data of Homo sapiens, we proposed a support vector machine-based model to predict 2'-O-methylation sites in H. sapiens. In this model, the RNA sequences were encoded with the optimal features obtained from feature selection. In the fivefold cross-validation test, the accuracy reached 97.95%.
Collapse
Affiliation(s)
- Hui Yang
- 1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China
| | - Hao Lv
- 1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China
| | - Hui Ding
- 1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China
| | - Wei Chen
- 1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China .,2 Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology , Tangshan, China
| | - Hao Lin
- 1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China
| |
Collapse
|
95
|
Hasan MM, Khatun MS, Mollah MNH, Yong C, Dianjing G. NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features. Molecules 2018; 23:E1667. [PMID: 29987232 PMCID: PMC6099560 DOI: 10.3390/molecules23071667] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 06/28/2018] [Accepted: 06/28/2018] [Indexed: 02/06/2023] Open
Abstract
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation for further elucidating the mechanism of protein nitrotyrosination. However, experimental identification of nitrotyrosine sites through traditional methods are laborious and expensive. In silico prediction of nitrotyrosine sites based on protein sequence information are thus highly desired. Here, we report a novel predictor, NTyroSite, for accurate prediction of nitrotyrosine sites using sequence evolutionary information. The generated features were optimized using a Wilcoxon-rank sum test. A random forest classifier was then trained using these features to build the predictor. The final NTyroSite predictor achieved an area under a receiver operating characteristics curve (AUC) score of 0.904 in a 10-fold cross-validation test. It also significantly outperformed other existing implementations in an independent test. Meanwhile, for a better understanding of our prediction model, the predominant rules and informative features were extracted from the NTyroSite model to explain the prediction results. We expect that the NTyroSite predictor may serve as a useful computational resource for high-throughput nitrotyrosine site prediction. The online interface of the software is publicly available at https://biocomputer.bio.cuhk.edu.hk/NTyroSite/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- School of Life Sciences and the State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Mst Shamima Khatun
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Cao Yong
- Department of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen 518000, China.
| | - Guo Dianjing
- School of Life Sciences and the State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
96
|
Ren Y, Feng X, Xia X, Zhang Y, Zhang W, Su J, Wang Z, Xu Y, Zhou F. Gender specificity improves the early-stage detection of clear cell renal cell carcinoma based on methylomic biomarkers. Biomark Med 2018; 12:607-618. [PMID: 29707986 DOI: 10.2217/bmm-2018-0084] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
AIM The two genders are different ranging from the molecular to the phenotypic levels. But most studies did not use this important information. We hypothesize that the integration of gender information may improve the overall prediction accuracy. MATERIALS & METHODS A comprehensive comparative study was carried out to test the hypothesis. The classification of the stages I + II versus III + IV of the clear cell renal cell carcinoma samples was formulated as an example. RESULTS & CONCLUSION In most cases, female-specific model significantly outperformed both-gender model, as similarly for the male-specific model. Our data suggested that gender information is essential for building biomedical classification models and even a simple strategy of building two gender-specific models may outperform the gender-mixed model.
Collapse
Affiliation(s)
- Yanjiao Ren
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.,College of Information Technology, Jilin Agricultural University, Changchun, Jilin 130118, China
| | - Xin Feng
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Xin Xia
- College of Software, Jilin University, Changchun, Jilin 130012, China
| | - Yexian Zhang
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Wenniu Zhang
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Jing Su
- Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, Jilin 130021, China
| | - Zhongyu Wang
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Ying Xu
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.,Computational Systems Biology Lab, Department of Biochemistry & Molecular Biology, University of Georgia, Athens, Georgia, 30602, USA.,College of Public Health, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| |
Collapse
|
97
|
Huang G, Li J, Zhao C. Computational Prediction and Analysis of Associations between Small Molecules and Binding-Associated S-Nitrosylation Sites. Molecules 2018; 23:molecules23040954. [PMID: 29671802 PMCID: PMC6017196 DOI: 10.3390/molecules23040954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 03/30/2018] [Accepted: 04/09/2018] [Indexed: 01/12/2023] Open
Abstract
Interactions between drugs and proteins occupy a central position during the process of drug discovery and development. Numerous methods have recently been developed for identifying drug–target interactions, but few have been devoted to finding interactions between post-translationally modified proteins and drugs. We presented a machine learning-based method for identifying associations between small molecules and binding-associated S-nitrosylated (SNO-) proteins. Namely, small molecules were encoded by molecular fingerprint, SNO-proteins were encoded by the information entropy-based method, and the random forest was used to train a classifier. Ten-fold and leave-one-out cross validations achieved, respectively, 0.7235 and 0.7490 of the area under a receiver operating characteristic curve. Computational analysis of similarity suggested that SNO-proteins associated with the same drug shared statistically significant similarity, and vice versa. This method and finding are useful to identify drug–SNO associations and further facilitate the discovery and development of SNO-associated drugs.
Collapse
Affiliation(s)
- Guohua Huang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China.
- College of Information Engineering, Shaoyang University, Shaoyang 422000, China.
| | - Jincheng Li
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China.
- College of Information Engineering, Shaoyang University, Shaoyang 422000, China.
| | - Chenglin Zhao
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China.
- College of Information Engineering, Shaoyang University, Shaoyang 422000, China.
| |
Collapse
|
98
|
Bourré G, Cantrelle FX, Kamah A, Chambraud B, Landrieu I, Smet-Nocca C. Direct Crosstalk Between O-GlcNAcylation and Phosphorylation of Tau Protein Investigated by NMR Spectroscopy. Front Endocrinol (Lausanne) 2018; 9:595. [PMID: 30386294 PMCID: PMC6198643 DOI: 10.3389/fendo.2018.00595] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 09/19/2018] [Indexed: 12/20/2022] Open
Abstract
The formation of intraneuronal fibrillar inclusions of tau protein is associated with several neurodegenerative diseases referred to as tauopathies including Alzheimer's disease (AD). A common feature of these pathologies is hyperphosphorylation of tau, the main component of fibrillar assemblies such as Paired Helical Filaments (PHFs). O-β-linked N-acetylglucosaminylation (O-GlcNAcylation) is another important posttranslational modification involved in regulation of tau pathophysiology. Among the benefits of O-GlcNAcylation, modulation of tau phosphorylation levels and inhibition of tau aggregation properties have been described while decreased O-GlcNAcylation could be involved in the raise of tau phosphorylation associated with AD. However, the molecular mechanisms at the basis of these observations remain to be defined. In this study, we identify by NMR spectroscopy O-GlcNAc sites in the longest isoform of tau and investigate the direct role of O-GlcNAcylation on tau phosphorylation and conversely, the role of phosphorylation on tau O-GlcNAcylation. We show here by a systematic examination of the quantitative modification patterns by NMR spectroscopy that O-GlcNAcylation does not modify phosphorylation of tau by the kinase activity of ERK2 or a rat brain extract while phosphorylation slightly increases tau O-GlcNAcylation by OGT. Our data suggest that indirect mechanisms act in the reciprocal regulation of tau phosphorylation and O-GlcNAcylation in vivo involving regulation of the enzymes responsible of phosphate and O-GlcNAc dynamics.
Collapse
Affiliation(s)
- Gwendoline Bourré
- Univ. Lille, CNRS UMR8576, Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | | | - Amina Kamah
- Univ. Lille, CNRS UMR8576, Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | | | - Isabelle Landrieu
- Univ. Lille, CNRS UMR8576, Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Caroline Smet-Nocca
- Univ. Lille, CNRS UMR8576, Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
- *Correspondence: Caroline Smet-Nocca
| |
Collapse
|