151
|
Govindaraj RG, Subramaniyam S, Manavalan B. Extremely-randomized-tree-based Prediction of N 6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Genomics 2020; 21:26-33. [PMID: 32655295 PMCID: PMC7324895 DOI: 10.2174/1389202921666200219125625] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/28/2019] [Accepted: 01/24/2020] [Indexed: 02/07/2023] Open
Abstract
Introduction N6-methyladenosine (m6A) is one of the most common post-transcriptional modifications in RNA, which has been related to several biological processes. The accurate prediction of m6A sites from RNA sequences is one of the challenging tasks in computational biology. Several computational methods utilizing machine-learning algorithms have been proposed that accelerate in silico screening of m6A sites, thereby drastically reducing the experimental time and labor costs involved. Methodology In this study, we proposed a novel computational predictor termed ERT-m6Apred, for the accurate prediction of m6A sites. To identify the feature encodings with more discriminative capability, we applied a two-step feature selection technique on seven different feature encodings and identified the corresponding optimal feature set. Results Subsequently, performance comparison of the corresponding optimal feature set-based extremely randomized tree model revealed that Pseudo k-tuple composition encoding, which includes 14 physicochemical properties significantly outperformed other encodings. Moreover, ERT-m6Apred achieved an accuracy of 78.84% during cross-validation analysis, which is comparatively better than recently reported predictors. Conclusion In summary, ERT-m6Apred predicts Saccharomyces cerevisiae m6A sites with higher accuracy, thus facilitating biological hypothesis generation and experimental validations.
Collapse
Affiliation(s)
- Rajiv G Govindaraj
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Sathiyamoorthy Subramaniyam
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Balachandran Manavalan
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
152
|
Li HF, Wang XF, Tang H. Predicting Bacteriophage Enzymes and Hydrolases by Using Combined Features. Front Bioeng Biotechnol 2020; 8:183. [PMID: 32266225 PMCID: PMC7105632 DOI: 10.3389/fbioe.2020.00183] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 02/24/2020] [Indexed: 12/19/2022] Open
Abstract
Bacteriophage is a type of virus that could infect the host bacteria. They have been applied in the treatment of pathogenic bacterial infection. Phage enzymes and hydrolases play the most important role in the destruction of bacterial cells. Correctly identifying the hydrolases coded by phage is not only beneficial to their function study, but also conducive to antibacteria drug discovery. Thus, this work aims to recognize the enzymes and hydrolases in phage. A combination of different features was used to represent samples of phage and hydrolase. A feature selection technique called analysis of variance was developed to optimize features. The classification was performed by using support vector machine (SVM). The prediction process includes two steps. The first step is to identify phage enzymes. The second step is to determine whether a phage enzyme is hydrolase or not. The jackknife cross-validated results showed that our method could produce overall accuracies of 85.1 and 94.3%, respectively, for the two predictions, demonstrating that the proposed method is promising.
Collapse
Affiliation(s)
- Hong-Fei Li
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China.,School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China
| |
Collapse
|
153
|
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020; 8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
154
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020; 9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand;
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| |
Collapse
|
155
|
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev 2020; 40:1276-1314. [DOI: 10.1002/med.21658] [Citation(s) in RCA: 139] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/26/2019] [Accepted: 12/16/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Shaherin Basith
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | | | - Tae Hwan Shin
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | - Gwang Lee
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| |
Collapse
|
156
|
Cai J, Wang D, Chen R, Niu Y, Ye X, Su R, Xiao G, Wei L. A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol. Front Bioeng Biotechnol 2020; 8:502. [PMID: 32582654 PMCID: PMC7287168 DOI: 10.3389/fbioe.2020.00502] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 04/29/2020] [Indexed: 01/04/2023] Open
Abstract
DNA N6-methyladenine (6mA) is closely involved with various biological processes. Identifying the distributions of 6mA modifications in genome-scale is of great significance to in-depth understand the functions. In recent years, various experimental and computational methods have been proposed for this purpose. Unfortunately, existing methods cannot provide accurate and fast 6mA prediction. In this study, we present 6mAPred-FO, a bioinformatics tool that enables researchers to make predictions based on sequences only. To sufficiently capture the characteristics of 6mA sites, we integrate the sequence-order information with nucleotide positional specificity information for feature encoding, and further improve the feature representation capacity by analysis of variance-based feature optimization protocol. The experimental results show that using this feature protocol, we can significantly improve the predictive performance. Via further feature analysis, we found that the sequence-order information and positional specificity information are complementary to each other, contributing to the performance improvement. On the other hand, the improvement is also due to the use of the feature optimization protocol, which is capable of effectively capturing the most informative features from the original feature space. Moreover, benchmarking comparison results demonstrate that our 6mAPred-FO outperforms several existing predictors. Finally, we establish a web-server that implements the proposed method for convenience of researchers' use, which is currently available at http://server.malab.cn/6mAPred-FO.
Collapse
Affiliation(s)
- Jianhua Cai
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Riqing Chen
- College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yuzhen Niu
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guobao Xiao
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- *Correspondence: Guobao Xiao
| | - Leyi Wei
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- School of Software, Shandong University, Jinan, China
- Leyi Wei
| |
Collapse
|
157
|
iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou's 5-Steps Rule and Informative Physicochemical Properties. Int J Mol Sci 2019; 21:ijms21010075. [PMID: 31861928 PMCID: PMC6981611 DOI: 10.3390/ijms21010075] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 01/18/2023] Open
Abstract
Understanding of quorum-sensing peptides (QSPs) in their functional mechanism plays an essential role in finding new opportunities to combat bacterial infections by designing drugs. With the avalanche of the newly available peptide sequences in the post-genomic age, it is highly desirable to develop a computational model for efficient, rapid and high-throughput QSP identification purely based on the peptide sequence information alone. Although, few methods have been developed for predicting QSPs, their prediction accuracy and interpretability still requires further improvements. Thus, in this work, we proposed an accurate sequence-based predictor (called iQSP) and a set of interpretable rules (called IR-QSP) for predicting and analyzing QSPs. In iQSP, we utilized a powerful support vector machine (SVM) cooperating with 18 informative features from physicochemical properties (PCPs). Rigorous independent validation test showed that iQSP achieved maximum accuracy and MCC of 93.00% and 0.86, respectively. Furthermore, a set of interpretable rules IR-QSP was extracted by using random forest model and the 18 informative PCPs. Finally, for the convenience of experimental scientists, the iQSP web server was established and made freely available online. It is anticipated that iQSP will become a useful tool or at least as a complementary existing method for predicting and analyzing QSPs.
Collapse
|
158
|
Basith S, Manavalan B, Shin TH, Lee G. SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:131-141. [PMID: 31542696 PMCID: PMC6796762 DOI: 10.1016/j.omtn.2019.08.011] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 07/30/2019] [Accepted: 08/08/2019] [Indexed: 12/19/2022]
Abstract
DNA N6-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can identify 6mA sites but show limited prediction accuracy, which limits their usability in epigenetic research. Here, we developed a novel computational predictor, called the Sequence-based DNA N6-methyladenine predictor (SDM6A), which is a two-layer ensemble approach for identifying 6mA sites in the rice genome. Unlike existing methods, which are based on single models with basic features, SDM6A explores various features, and five encoding methods were identified as appropriate for this problem. Subsequently, an optimal feature set was identified from encodings, and corresponding models were developed individually using support vector machine and extremely randomized tree. First, all five single models were integrated via ensemble approach to define the class for each classifier. Second, two classifiers were integrated to generate a final prediction. SDM6A achieved robust performance on cross-validation and independent evaluation, with average accuracy and Matthews correlation coefficient (MCC) of 88.2% and 0.764, respectively. Corresponding metrics were 4.7%-11.0% and 2.3%-5.5% higher than those of existing methods, respectively. A user-friendly, publicly accessible web server (http://thegleelab.org/SDM6A) was implemented to predict novel putative 6mA sites in rice genome.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | | | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.
| |
Collapse
|
159
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int J Biol Macromol 2019; 157:752-758. [PMID: 31805335 DOI: 10.1016/j.ijbiomac.2019.12.009] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 11/29/2019] [Accepted: 12/02/2019] [Indexed: 12/18/2022]
Abstract
One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 443380, Republic of Korea
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
160
|
Numerical Modeling of Suspension Force for Bearingless Flywheel Machine Based on Differential Evolution Extreme Learning Machine. ENERGIES 2019. [DOI: 10.3390/en12234470] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The analytical model (AM) of suspension force in a bearingless flywheel machine has model mismatch problems due to magnetic saturation and rotor eccentricity. A numerical modeling method based on the differential evolution (DE) extreme learning machine (ELM) is proposed in this paper. The representative input and output sample set are obtained by finite-element analysis (FEA) and principal component analysis (PCA), and the numerical model of suspension force is obtained by training ELM. Additionally, the DE algorithm is employed to optimize the ELM parameters to improve the model accuracy. Finally, absolute error (AE) and root mean squared error (RMSE) are introduced as evaluation indexes to conduct comparative analyses with other commonly-used machine learning algorithms, such as k-Nearest Neighbor (KNN), the back propagation (BP) algorithm, and support vector machines (SVMs). The results show that, compared with the above algorithm, the proposed method has smaller fitting and prediction errors; the RMSE value is just 22.88% of KNN, 39.90% of BP, and 58.37% of SVM, which verifies the effectiveness and validity of the proposed numerical modeling method.
Collapse
|
161
|
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. Meta-iAVP: A Sequence-Based Meta-Predictor for Improving the Prediction of Antiviral Peptides Using Effective Feature Representation. Int J Mol Sci 2019; 20:ijms20225743. [PMID: 31731751 PMCID: PMC6888698 DOI: 10.3390/ijms20225743] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/07/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022] Open
Abstract
In spite of the large-scale production and widespread distribution of vaccines and antiviral drugs, viruses remain a prominent human disease. Recently, the discovery of antiviral peptides (AVPs) has become an influential antiviral agent due to their extraordinary advantages. With the avalanche of newly-found peptide sequences in the post-genomic era, there is a great demand to develop a sequence-based predictor for timely identifying AVPs as this information is very useful for both basic research and drug development. In this study, we propose a novel sequence-based meta-predictor with an effective feature representation, called Meta-iAVP, for the accurate prediction of AVPs from given peptide sequences. Herein, the effective feature representation was extracted from a set of prediction scores derived from various machine learning algorithms and types of features. To the best of our knowledge, the model proposed herein represents the first meta-based approach for the prediction of AVPs. An overall accuracy and Matthews correlation coefficient of 95.20% and 0.90, respectively, was achieved from the independent test set on an objective benchmark dataset. Comparative analysis suggested that Meta-iAVP was superior to that of existing methods and therefore represents a useful tool for AVP prediction. Finally, in an effort to facilitate high-throughput prediction of AVPs, the model was deployed as the Meta-iAVP web server and is made freely available online at http://codes.bio/meta-iavp/ where users can submit query peptide sequences for determining the likelihood of whether or not these peptides are AVPs.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
| | - Virapong Prachayasittikul
- Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
- Correspondence: ; Tel.: +66-2441-4371 (ext. 2715)
| |
Collapse
|
162
|
Wang X, Zhu X, Ye M, Wang Y, Li CD, Xiong Y, Wei DQ. STS-NLSP: A Network-Based Label Space Partition Method for Predicting the Specificity of Membrane Transporter Substrates Using a Hybrid Feature of Structural and Semantic Similarity. Front Bioeng Biotechnol 2019; 7:306. [PMID: 31781551 PMCID: PMC6851049 DOI: 10.3389/fbioe.2019.00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open
Abstract
Membrane transport proteins play crucial roles in the pharmacokinetics of substrate drugs, the drug resistance in cancer and are vital to the process of drug discovery, development and anti-cancer therapeutics. However, experimental methods to profile a substrate drug against a panel of transporters to determine its specificity are labor intensive and time consuming. In this article, we aim to develop an in silico multi-label classification approach to predict whether a substrate can specifically recognize one of the 13 categories of drug transporters ranging from ATP-binding cassette to solute carrier families using both structural fingerprints and chemical ontologies information of substrates. The data-driven network-based label space partition (NLSP) method was utilized to construct the model based on a hybrid of similarity-based feature by the integration of 2D fingerprint and semantic similarity. This method builds predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes union of label sets for a compound as final prediction. NLSP lies into the ensembles of multi-label classifier category in multi-label learning field. We utilized Cramér's V statistics to quantify the label correlations and depicted them via a heatmap. The jackknife tests and iterative stratification based cross-validation method were adopted on a benchmark dataset to evaluate the prediction performance of the proposed models both in multi-label and label-wise manner. Compared with other powerful multi-label methods, ML-kNN, MTSVM, and RAkELd, our multi-label classification model of NLPS-RF (random forest-based NLSP) has proven to be a feasible and effective model, and performed satisfactorily in the predictive task of transporter-substrate specificity. The idea behind NLSP method is intriguing and the power of NLSP remains to be explored for the multi-label learning problems in bioinformatics. The benchmark dataset, intermediate results and python code which can fully reproduce our experiments and results are available at https://github.com/dqwei-lab/STS.
Collapse
Affiliation(s)
- Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, China
| | - Mingzhi Ye
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Cheng-Dong Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
163
|
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-methylcytosine Sites in the Mouse Genome. Cells 2019; 8:cells8111332. [PMID: 31661923 PMCID: PMC6912380 DOI: 10.3390/cells8111332] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 10/21/2019] [Accepted: 10/24/2019] [Indexed: 12/24/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.
Collapse
|
164
|
Jin H, Titus A, Liu Y, Wang Y, Han Z. Fault Diagnosis of Rotary Parts of a Heavy-Duty Horizontal Lathe Based on Wavelet Packet Transform and Support Vector Machine. SENSORS 2019; 19:s19194069. [PMID: 31547146 PMCID: PMC6806313 DOI: 10.3390/s19194069] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 01/26/2023]
Abstract
The spindle box is responsible for power transmission, supporting the rotating parts and ensuring the rotary accuracy of the workpiece in the heavy-duty machine tool. Its assembly quality is crucial to ensure the reliable power supply and stable operation of the machine tool in the process of large load and cutting force. Therefore, accurate diagnosis of assembly faults is of great significance for improving assembly efficiency and ensuring outgoing quality. In this paper, the common fault types and characteristics of the spindle box of heavy horizontal lathe are analyzed first, and original vibration signals of various fault types are collected. The wavelet packet is used to decompose the signal into different frequency bands and reconstruct the nodes in the frequency band where the characteristic frequency points are located. Then, the power spectrum analysis is carried out on the reconstructed signal, so that the fault features in the signal can be clearly expressed. The structure of the feature vector used for fault diagnosis is analyzed and the feature vector is extracted from the collected signals. Finally, the intelligent pattern recognition method based on support vector machine is used to classify the fault types. The results show that the method proposed in this paper can quickly and accurately judge the fault types.
Collapse
Affiliation(s)
- Hongyu Jin
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China; (H.J.); (A.T.); (Y.L.); (Y.W.)
| | - Avitus Titus
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China; (H.J.); (A.T.); (Y.L.); (Y.W.)
- Department of Engineering Sciences and Technology, Sokoine University of Agriculture, Morogoro 255, Tanzania
| | - Yulong Liu
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China; (H.J.); (A.T.); (Y.L.); (Y.W.)
| | - Yang Wang
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China; (H.J.); (A.T.); (Y.L.); (Y.W.)
| | - Zhenyu Han
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China; (H.J.); (A.T.); (Y.L.); (Y.W.)
- Correspondence:
| |
Collapse
|
165
|
Arif M, Ali F, Ahmad S, Kabir M, Ali Z, Hayat M. Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. Genomics 2019; 112:1565-1574. [PMID: 31526842 DOI: 10.1016/j.ygeno.2019.09.006] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/27/2019] [Accepted: 09/11/2019] [Indexed: 10/26/2022]
Abstract
Bacteriophage virion proteins (BVPs) are bacterial viruses that have a great impact on different biological functions of bacteria. They are significantly used in genetic engineering and phage therapy applications. Correct identification of BVP through conventional pathogen methods are slow and expensive. Thus, designing a Bioinformatics predictor is urgently desirable to accelerate correct identification of BVPs within a huge volume of proteins. However, available prediction tools performance is inadequate due to the lack of useful feature representation and severe imbalance issue. In the present study, we propose an intelligent model, called Pred-BVP-Unb for discrimination of BVPs that employed three nominal sequences-driven descriptors, i.e. Bi-PSSM evolutionary information, composition & translation, and split amino acid composition. The imbalance phenomena between classes were coped with the help of a synthetic minority oversampling technique. The essential attributes are selected by a robust algorithm called recursive feature elimination. Finally, the optimal feature space is provided to support vector machine classifier using a radial base kernel in order to train the model. Our predictor remarkably outperforms than existing approaches in the literature by achieving the highest accuracy of 92.54% and 83.06% respectively on the benchmark and independent datasets. We expect that Pred-BVP-Unb tool can provide useful hints for designing antibacterial drugs and also helpful to expedite large scale discovery of new bacteriophage virion proteins. The source code and all datasets are publicly available at https://github.com/Muhammad-Arif-NUST/BVP_Pred_Unb.
Collapse
Affiliation(s)
- Muhammad Arif
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Saeed Ahmad
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Muhammad Kabir
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Zakir Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
| |
Collapse
|
166
|
A Hierarchical Self-Adaptive Method for Post-Disturbance Transient Stability Assessment of Power Systems Using an Integrated CNN-Based Ensemble Classifier. ENERGIES 2019. [DOI: 10.3390/en12173217] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Data-driven approaches using synchronous phasor measurements are playing an important role in transient stability assessment (TSA). For post-disturbance TSA, there is not a definite conclusion about how long the response time should be. Furthermore, previous studies seldom considered the confidence level of prediction results and specific stability degree. Since transient stability can develop very fast and cause tremendous economic losses, there is an urgent need for faster response speed, credible accurate prediction results, and specific stability degree. This paper proposed a hierarchical self-adaptive method using an integrated convolutional neural network (CNN)-based ensemble classifier to solve these problems. Firstly, a set of classifiers are sequentially organized at different response times to construct different layers of the proposed method. Secondly, the confidence integrated decision-making rules are defined. Those predicted as credible stable/unstable cases are sent into the stable/unstable regression model which is built at the corresponding decision time. The simulation results show that the proposed method can not only balance the accuracy and rapidity of the transient stability prediction, but also predict the stability degree with very low prediction errors, allowing more time and an instructive guide for emergency controls.
Collapse
|
167
|
Khatun S, Hasan M, Kurata H. Efficient computational model for identification of antitubercular peptides by integrating amino acid patterns and properties. FEBS Lett 2019; 593:3029-3039. [PMID: 31297788 DOI: 10.1002/1873-3468.13536] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/25/2019] [Accepted: 07/05/2019] [Indexed: 12/30/2022]
Abstract
Tuberculosis (TB) is a leading killer caused by Mycobacterium tuberculosis. Recently, anti-TB peptides have provided an alternative approach to combat antibiotic tolerance. We have developed an effective computational predictor, identification of antitubercular peptides (iAntiTB), by the integration of multiple feature vectors deriving from the amino acid sequences via random forest (RF) and support vector machine (SVM) classifiers. The iAntiTB combines the RF and SVM scores via linear regression to enhance the prediction accuracy. To make a robust and accurate predictor, we prepared the two datasets with different types of negative samples. The iAntiTB achieved area under the ROC curve values of 0.896 and 0.946 on the training datasets of the first and second datasets, respectively. The iAntiTB outperformed the other existing predictors.
Collapse
Affiliation(s)
- Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan.,Biomedical Informatics R&D Center, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
168
|
AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019; 17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open
Abstract
Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions. We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree. AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets. We constructed a user-friendly web server that implements the proposed AtbPpred method.
Collapse
|
169
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Mol Omics 2019; 15:451-458. [DOI: 10.1039/c9mo00098d] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cysteine S-nitrosylation is a type of reversible post-translational modification of proteins, which controls diverse biological processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Japan Society for the Promotion of Science
| | | | - Mst. Shamima Khatun
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|