101
|
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. Meta-iAVP: A Sequence-Based Meta-Predictor for Improving the Prediction of Antiviral Peptides Using Effective Feature Representation. Int J Mol Sci 2019; 20:ijms20225743. [PMID: 31731751 PMCID: PMC6888698 DOI: 10.3390/ijms20225743] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/07/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022] Open
Abstract
In spite of the large-scale production and widespread distribution of vaccines and antiviral drugs, viruses remain a prominent human disease. Recently, the discovery of antiviral peptides (AVPs) has become an influential antiviral agent due to their extraordinary advantages. With the avalanche of newly-found peptide sequences in the post-genomic era, there is a great demand to develop a sequence-based predictor for timely identifying AVPs as this information is very useful for both basic research and drug development. In this study, we propose a novel sequence-based meta-predictor with an effective feature representation, called Meta-iAVP, for the accurate prediction of AVPs from given peptide sequences. Herein, the effective feature representation was extracted from a set of prediction scores derived from various machine learning algorithms and types of features. To the best of our knowledge, the model proposed herein represents the first meta-based approach for the prediction of AVPs. An overall accuracy and Matthews correlation coefficient of 95.20% and 0.90, respectively, was achieved from the independent test set on an objective benchmark dataset. Comparative analysis suggested that Meta-iAVP was superior to that of existing methods and therefore represents a useful tool for AVP prediction. Finally, in an effort to facilitate high-throughput prediction of AVPs, the model was deployed as the Meta-iAVP web server and is made freely available online at http://codes.bio/meta-iavp/ where users can submit query peptide sequences for determining the likelihood of whether or not these peptides are AVPs.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
| | - Virapong Prachayasittikul
- Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
- Correspondence: ; Tel.: +66-2441-4371 (ext. 2715)
| |
Collapse
|
102
|
Wang X, Zhu X, Ye M, Wang Y, Li CD, Xiong Y, Wei DQ. STS-NLSP: A Network-Based Label Space Partition Method for Predicting the Specificity of Membrane Transporter Substrates Using a Hybrid Feature of Structural and Semantic Similarity. Front Bioeng Biotechnol 2019; 7:306. [PMID: 31781551 PMCID: PMC6851049 DOI: 10.3389/fbioe.2019.00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open
Abstract
Membrane transport proteins play crucial roles in the pharmacokinetics of substrate drugs, the drug resistance in cancer and are vital to the process of drug discovery, development and anti-cancer therapeutics. However, experimental methods to profile a substrate drug against a panel of transporters to determine its specificity are labor intensive and time consuming. In this article, we aim to develop an in silico multi-label classification approach to predict whether a substrate can specifically recognize one of the 13 categories of drug transporters ranging from ATP-binding cassette to solute carrier families using both structural fingerprints and chemical ontologies information of substrates. The data-driven network-based label space partition (NLSP) method was utilized to construct the model based on a hybrid of similarity-based feature by the integration of 2D fingerprint and semantic similarity. This method builds predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes union of label sets for a compound as final prediction. NLSP lies into the ensembles of multi-label classifier category in multi-label learning field. We utilized Cramér's V statistics to quantify the label correlations and depicted them via a heatmap. The jackknife tests and iterative stratification based cross-validation method were adopted on a benchmark dataset to evaluate the prediction performance of the proposed models both in multi-label and label-wise manner. Compared with other powerful multi-label methods, ML-kNN, MTSVM, and RAkELd, our multi-label classification model of NLPS-RF (random forest-based NLSP) has proven to be a feasible and effective model, and performed satisfactorily in the predictive task of transporter-substrate specificity. The idea behind NLSP method is intriguing and the power of NLSP remains to be explored for the multi-label learning problems in bioinformatics. The benchmark dataset, intermediate results and python code which can fully reproduce our experiments and results are available at https://github.com/dqwei-lab/STS.
Collapse
Affiliation(s)
- Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, China
| | - Mingzhi Ye
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Cheng-Dong Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
103
|
Wang F, Guan ZX, Dao FY, Ding H. A Brief Review of the Computational Identification of Antifreeze Protein. CURR ORG CHEM 2019. [DOI: 10.2174/1385272823666190718145613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Lots of cold-adapted organisms could produce antifreeze proteins (AFPs) to counter the freezing of cell fluids by controlling the growth of ice crystal. AFPs have been found in various species such as in vertebrates, invertebrates, plants, bacteria, and fungi. These AFPs from fish, insects and plants displayed a high diversity. Thus, the identification of the AFPs is a challenging task in computational proteomics. With the accumulation of AFPs and development of machine meaning methods, it is possible to construct a high-throughput tool to timely identify the AFPs. In this review, we briefly reviewed the application of machine learning methods in antifreeze proteins identification from difference section, including published benchmark dataset, sequence descriptor, classification algorithms and published methods. We hope that this review will produce new ideas and directions for the researches in identifying antifreeze proteins.
Collapse
Affiliation(s)
- Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
104
|
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-methylcytosine Sites in the Mouse Genome. Cells 2019; 8:cells8111332. [PMID: 31661923 PMCID: PMC6912380 DOI: 10.3390/cells8111332] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 10/21/2019] [Accepted: 10/24/2019] [Indexed: 12/24/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.
Collapse
|
105
|
Bahrami AA, Payandeh Z, Khalili S, Zakeri A, Bandehpour M. Immunoinformatics: In Silico Approaches and Computational Design of a Multi-epitope, Immunogenic Protein. Int Rev Immunol 2019; 38:307-322. [PMID: 31478759 DOI: 10.1080/08830185.2019.1657426] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Immunoinformatics is a new critical field with several tools and databases that conduct the eyesight of experimental selection and facilitate analysis of the great amount of immunologic data obtained from experimental researches and helps to design and introducing new hypothesis. Given these visages, immunoinformatics seems to be the way that develop and progress the immunological research. Bioinformatics methods and applications are successfully employed in vaccine informatics to assist different sites of the preclinical, clinical, and post-licensure vaccine enterprises. On the other hand, the progression of molecular biology and immunology caused epitope vaccines have become the focus of research on molecular vaccines. Moreover, reverse vaccinology could improve vaccine production and vaccination protocols by in silico prediction of protein-vaccine candidates from genome sequences. B- and T-cell immune epitopes could be predicted by immunoinformatics algorithms and computational methods to improve the vaccine design, protective immunity analysis, assessment of vaccine safety and efficacy, and immunization modeling. This review aims to discuss the power of computational approaches in vaccine design and their relevance to the development of effective vaccines. Furthermore, the various divisions of this field and available tools in each item are introduced and reviewed.
Collapse
Affiliation(s)
- Armina Alagheband Bahrami
- Department of Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Zahra Payandeh
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Saeed Khalili
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Alireza Zakeri
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Mojgan Bandehpour
- Department of Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
106
|
Khatun S, Hasan M, Kurata H. Efficient computational model for identification of antitubercular peptides by integrating amino acid patterns and properties. FEBS Lett 2019; 593:3029-3039. [PMID: 31297788 DOI: 10.1002/1873-3468.13536] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/25/2019] [Accepted: 07/05/2019] [Indexed: 12/30/2022]
Abstract
Tuberculosis (TB) is a leading killer caused by Mycobacterium tuberculosis. Recently, anti-TB peptides have provided an alternative approach to combat antibiotic tolerance. We have developed an effective computational predictor, identification of antitubercular peptides (iAntiTB), by the integration of multiple feature vectors deriving from the amino acid sequences via random forest (RF) and support vector machine (SVM) classifiers. The iAntiTB combines the RF and SVM scores via linear regression to enhance the prediction accuracy. To make a robust and accurate predictor, we prepared the two datasets with different types of negative samples. The iAntiTB achieved area under the ROC curve values of 0.896 and 0.946 on the training datasets of the first and second datasets, respectively. The iAntiTB outperformed the other existing predictors.
Collapse
Affiliation(s)
- Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan.,Biomedical Informatics R&D Center, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
107
|
AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019; 17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open
Abstract
Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions. We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree. AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets. We constructed a user-friendly web server that implements the proposed AtbPpred method.
Collapse
|
108
|
Manavalan B, Basith S, Shin TH, Wei L, Lee G. Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 16:733-744. [PMID: 31146255 PMCID: PMC6540332 DOI: 10.1016/j.omtn.2019.04.019] [Citation(s) in RCA: 165] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 04/16/2019] [Accepted: 04/22/2019] [Indexed: 11/19/2022]
Abstract
DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, China.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea.
| |
Collapse
|
109
|
mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int J Mol Sci 2019; 20:ijms20081964. [PMID: 31013619 PMCID: PMC6514805 DOI: 10.3390/ijms20081964] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 04/08/2019] [Accepted: 04/18/2019] [Indexed: 12/24/2022] Open
Abstract
Anticancer peptides (ACPs) are promising therapeutic agents for targeting and killing cancer cells. The accurate prediction of ACPs from given peptide sequences remains as an open problem in the field of immunoinformatics. Recently, machine learning algorithms have emerged as a promising tool for helping experimental scientists predict ACPs. However, the performance of existing methods still needs to be improved. In this study, we present a novel approach for the accurate prediction of ACPs, which involves the following two steps: (i) We applied a two-step feature selection protocol on seven feature encodings that cover various aspects of sequence information (composition-based, physicochemical properties and profiles) and obtained their corresponding optimal feature-based models. The resultant predicted probabilities of ACPs were further utilized as feature vectors. (ii) The predicted probability feature vectors were in turn used as an input to support vector machine to develop the final prediction model called mACPpred. Cross-validation analysis showed that the proposed predictor performs significantly better than individual feature encodings. Furthermore, mACPpred significantly outperformed the existing methods compared in this study when objectively evaluated on an independent dataset.
Collapse
|
110
|
Chen ZH, You ZH, Li LP, Wang YB, Wong L, Yi HC. Prediction of Self-Interacting Proteins from Protein Sequence Information Based on Random Projection Model and Fast Fourier Transform. Int J Mol Sci 2019; 20:ijms20040930. [PMID: 30795499 PMCID: PMC6412412 DOI: 10.3390/ijms20040930] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 01/06/2019] [Accepted: 01/07/2019] [Indexed: 12/30/2022] Open
Abstract
It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
111
|
Analysis and Forecasting of the Carbon Price in China’s Regional Carbon Markets Based on Fast Ensemble Empirical Mode Decomposition, Phase Space Reconstruction, and an Improved Extreme Learning Machine. ENERGIES 2019. [DOI: 10.3390/en12020277] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
With the development of the carbon market in China, research on the carbon price has received more and more attention in related fields. However, due to its nonlinearity and instability, the carbon price is undoubtedly difficult to predict using a single model. This paper proposes a new hybrid model for carbon price forecasting that combines fast ensemble empirical mode decomposition, sample entropy, phase space reconstruction, a partial autocorrelation function, and an extreme learning machine that has been improved by particle swarm optimization. The original carbon price series is decomposed using the fast ensemble empirical mode decomposition and sample entropy methods, which eliminate noise interference. Then, the phase space reconstruction and partial autocorrelation function methods are combined to determine the input and output variables in the forecasting models. An extreme learning machine optimized by particle swarm optimization was employed to forecast carbon prices. An empirical study based on carbon prices in three typical regional carbon markets in China found that this new hybrid model performed better than other comparable models.
Collapse
|
112
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Mol Omics 2019; 15:451-458. [DOI: 10.1039/c9mo00098d] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cysteine S-nitrosylation is a type of reversible post-translational modification of proteins, which controls diverse biological processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Japan Society for the Promotion of Science
| | | | - Mst. Shamima Khatun
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|
113
|
A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8122502] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM) - Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.
Collapse
|
114
|
Basith S, Manavalan B, Shin TH, Lee G. iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Comput Struct Biotechnol J 2018; 16:412-420. [PMID: 30425802 PMCID: PMC6222285 DOI: 10.1016/j.csbj.2018.10.007] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 10/04/2018] [Accepted: 10/12/2018] [Indexed: 11/27/2022] Open
Abstract
A soluble carrier growth hormone binding protein (GHBP) that can selectively and non-covalently interact with growth hormone, thereby acting as a modulator or inhibitor of growth hormone signalling. Accurate identification of the GHBP from a given protein sequence also provides important clues for understanding cell growth and cellular mechanisms. In the postgenomic era, there has been an abundance of protein sequence data garnered, hence it is crucial to develop an automated computational method which enables fast and accurate identification of putative GHBPs within a vast number of candidate proteins. In this study, we describe a novel machine-learning-based predictor called iGHBP for the identification of GHBP. In order to predict GHBP from a given protein sequence, we trained an extremely randomised tree with an optimal feature set that was obtained from a combination of dipeptide composition and amino acid index values by applying a two-step feature selection protocol. During cross-validation analysis, iGHBP achieved an accuracy of 84.9%, which was ~7% higher than the control extremely randomised tree predictor trained with all features, thus demonstrating the effectiveness of our feature selection protocol. Furthermore, when objectively evaluated on an independent data set, our proposed iGHBP method displayed superior performance compared to the existing method. Additionally, a user-friendly web server that implements the proposed iGHBP has been established and is available at http://thegleelab.org/iGHBP.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | | | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| |
Collapse
|
115
|
A Hybrid Deep Learning Model for Predicting Protein Hydroxylation Sites. Int J Mol Sci 2018; 19:ijms19092817. [PMID: 30231550 PMCID: PMC6164125 DOI: 10.3390/ijms19092817] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Revised: 09/12/2018] [Accepted: 09/15/2018] [Indexed: 12/17/2022] Open
Abstract
Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and which one cannot. The answer will not only help understand the mechanism of hydroxylation but can also benefit the development of new drugs. In this paper, we proposed a novel approach for predicting hydroxylation using a hybrid deep learning model integrating the convolutional neural network (CNN) and long short-term memory network (LSTM). We employed a pseudo amino acid composition (PseAAC) method to construct valid benchmark datasets based on a sliding window strategy and used the position-specific scoring matrix (PSSM) to represent samples as inputs to the deep learning model. In addition, we compared our method with popular predictors including CNN, iHyd-PseAAC, and iHyd-PseCp. The results for 5-fold cross-validations all demonstrated that our method significantly outperforms the other methods in prediction accuracy.
Collapse
|
116
|
Manavalan B, Shin TH, Kim MO, Lee G. PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions. Front Immunol 2018; 9:1783. [PMID: 30108593 PMCID: PMC6079197 DOI: 10.3389/fimmu.2018.01783] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 07/19/2018] [Indexed: 02/03/2023] Open
Abstract
Proinflammatory cytokines have the capacity to increase inflammatory reaction and play a central role in first line of defence against invading pathogens. Proinflammatory inducing peptides (PIPs) have been used as an antineoplastic agent, an antibacterial agent and a vaccine in immunization therapies. Due to the advancement in sequence technologies that resulted an avalanche of protein sequence data. Therefore, it is necessary to develop an automated computational method to enable fast and accurate identification of novel PIPs within the vast number of candidate proteins and peptides. To address this, we proposed a new predictor, PIP-EL, for predicting PIPs using the strategy of ensemble learning (EL). Our benchmarking dataset is imbalanced. Thus, we applied a random under-sampling technique to generate 10 balanced models for each composition. Technically, PIP-EL is the fusion of 50 independent random forest (RF) models, where each of the five different compositions, including amino acid, dipeptide, composition-transition-distribution, physicochemical properties, and amino acid index contains 10 RF models. PIP-EL achieves the Matthews' correlation coefficient (MCC) of 0.435 in a 5-fold cross-validation test, which is ~2-5% higher than that of the individual classifiers and hybrid feature-based classifier. Furthermore, we evaluate the performance of PIP-EL on the independent dataset, showing that our method outperforms the existing method and two different machine learning methods developed in this study, with an MCC of 0.454. These results indicate that PIP-EL will be a useful tool for predicting PIPs and for researchers working in the field of peptide therapeutics and immunotherapy. The user-friendly web server, PIP-EL, is freely accessible.
Collapse
Affiliation(s)
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, South Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea
| | - Myeong Ok Kim
- Division of Life Science and Applied Life Science (BK21 Plus), College of Natural Sciences, Gyeongsang National University, Jinju, South Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, South Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea
| |
Collapse
|