1
|
Li Y, Yu ND, Ye XL, Jiang MC, Chen XQ. Construction of lung cancer serum markers based on ReliefF feature selection. Comput Methods Biomech Biomed Engin 2024; 27:1215-1223. [PMID: 37489703 DOI: 10.1080/10255842.2023.2235045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 07/03/2023] [Indexed: 07/26/2023]
Abstract
Serum miRNAs are available clinical samples for cancer screening. Identifying early serum markers in lung cancer (LC) is essential for patients' early diagnosis and clinical treatment. Expression data of serum miRNAs of lung adenocarcinoma (LUAD) patients and healthy individuals were downloaded from the Gene Expression Omnibus (GEO). These data were normalized and subjected to differential expression analysis to obtain differentially expressed miRNAs (DEmiRNAs). The DEmiRNAs were subsequently subjected to ReliefF feature selection, and subsets closely related to cancer were screened as candidate feature miRNAs. Thereafter, a Gaussian Naive Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF) classifier were constructed based on these candidate feature miRNAs. Then the best diagnostic signature was constructed through NB combined with incremental feature selection (IFS). Thereafter, these samples were subjected to principal component analysis (PCA) based on miRNAs with optimal predictive performance. Finally, the peripheral serum miRNAs of 64 LUAD patients and 59 normal individuals were extracted for qRT-PCR analysis to validate the performance of the diagnostic model in respect of clinical detection. Finally, according to area under the curve (AUC) and accuracy values, the NB classifier composed of miR-5100 and miR-663a manifested the most outstanding diagnostic performance. The PCA results also revealed that the 2-miRNA diagnostic signature could effectively distinguish cancer patients from healthy individuals. Finally, qRT-PCR results of clinical serum samples revealed that miR-5100 and miR-663a expression in tumor samples was remarkably higher than that in normal samples. The AUC of the 2-miRNA diagnostic signature was 0.968. In summary, we identified markers (miR-5100 and miR-663a) in serum for early LUAD screening, providing ideas for developing early LUAD diagnostic models.
Collapse
Affiliation(s)
- Yong Li
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Nan-Ding Yu
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Xiang-Li Ye
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Mei-Chen Jiang
- Department of Pathology, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Xiang-Qi Chen
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| |
Collapse
|
2
|
Shang J, Peng C, Tang X, Sun Y. PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer. Bioinformatics 2023; 39:i30-i39. [PMID: 37387136 DOI: 10.1093/bioinformatics/btad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification. RESULTS In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence "images". Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins. AVAILABILITY AND IMPLEMENTATION The web server of PhaVIP is available via: https://phage.ee.cityu.edu.hk/phavip. The source code of PhaVIP is available via: https://github.com/KennthShang/PhaVIP.
Collapse
Affiliation(s)
- Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Cheng Peng
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Xubo Tang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| |
Collapse
|
3
|
Zhu J, Luo J, Ma Y. Screening of serum exosome markers for colorectal cancer based on Boruta and multi-cluster feature selection algorithms. Mol Cell Toxicol 2023. [DOI: 10.1007/s13273-023-00348-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
4
|
Prediction of Phage Virion Proteins Using Machine Learning Methods. Molecules 2023; 28:molecules28052238. [PMID: 36903484 PMCID: PMC10004995 DOI: 10.3390/molecules28052238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 01/27/2023] [Accepted: 02/20/2023] [Indexed: 03/04/2023] Open
Abstract
Antimicrobial resistance (AMR) is a major problem and an immediate alternative to antibiotics is the need of the hour. Research on the possible alternative products to tackle bacterial infections is ongoing worldwide. One of the most promising alternatives to antibiotics is the use of bacteriophages (phage) or phage-driven antibacterial drugs to cure bacterial infections caused by AMR bacteria. Phage-driven proteins, including holins, endolysins, and exopolysaccharides, have shown great potential in the development of antibacterial drugs. Likewise, phage virion proteins (PVPs) might also play an important role in the development of antibacterial drugs. Here, we have developed a machine learning-based prediction method to predict PVPs using phage protein sequences. We have employed well-known basic and ensemble machine learning methods with protein sequence composition features for the prediction of PVPs. We found that the gradient boosting classifier (GBC) method achieved the best accuracy of 80% on the training dataset and an accuracy of 83% on the independent dataset. The performance on the independent dataset is better than other existing methods. A user-friendly web server developed by us is freely available to all users for the prediction of PVPs from phage protein sequences. The web server might facilitate the large-scale prediction of PVPs and hypothesis-driven experimental study design.
Collapse
|
5
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
6
|
Fang Z, Feng T, Zhou H, Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience 2022; 11:giac076. [PMID: 35950840 PMCID: PMC9366990 DOI: 10.1093/gigascience/giac076] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/08/2022] [Accepted: 07/11/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task. FINDINGS Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures. CONCLUSIONS DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Muxuan Chen
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|
7
|
Huang X, Chen X, Chen X, Wang W. Screening of Serum miRNAs as Diagnostic Biomarkers for Lung Cancer Using the Minimal-Redundancy-Maximal-Relevance Algorithm and Random Forest Classifier Based on a Public Database. Public Health Genomics 2022; 25:1-9. [PMID: 35917800 DOI: 10.1159/000525316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 05/12/2022] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Lung cancer is one of the deadliest cancers, early diagnosis of which can efficiently enhance patient's survival. We aimed to screening out the serum miRNAs as diagnostic biomarkers for patients with lung cancer. METHODS A total of 416 remarkably differentially expressed miRNAs were acquired using the limma package, and next feature ranking was derived by the minimal-redundancy-maximal-relevance method. An incremental feature selection algorithm of a random forest (RF) classifier was utilized to choose the top 5 miRNA combination with the optimum predictive performance. The performance of the RF classifier of top 5 miRNAs was analyzed using the receiver operator characteristic (ROC) curve. Afterward, the classification effect of the 5-miRNA combination was validated through principal component analysis and hierarchical clustering analysis. Analysis of top 5 miRNA expressions between lung cancer patients and normal people was performed based on GSE137140 dataset, and their expression was validated by qPCR. The hierarchical clustering analysis was used to analyze the similarity of 5 miRNAs expression profiles. ROC analysis was undertaken on each miRNA. RESULTS We acquired top 5 miRNAs finally, with the Matthews correlation coefficient value as 0.988 and the area under the curve (AUC) value as 0.996. The 5 feature miRNAs were capable of distinguishing most cancer patients and normal people. Furthermore, except for the lowly expressed miR-6875-5p in lung cancer tissue, the other 4 miRNAs all expressed highly in cancer patients. Performance analysis revealed that their AUC values were 0.92, 0.96, 0.94, 0.95, and 0.93, respectively. CONCLUSION By and large, the 5 feature miRNAs screened here were anticipated to be effective biomarkers for lung cancer.
Collapse
Affiliation(s)
- Xiaoyan Huang
- Medical Oncology, 900 Hospital of the Joint Logistics Team, Fuzhou, China
| | - Xiong Chen
- Medical Oncology, 900 Hospital of the Joint Logistics Team, Fuzhou, China
| | - Xi Chen
- Medical Oncology, 900 Hospital of the Joint Logistics Team, Fuzhou, China
| | - Wenling Wang
- Medical Oncology, 900 Hospital of the Joint Logistics Team, Fuzhou, China
| |
Collapse
|
8
|
Sun Y, Zhang Q, Yang Q, Yao M, Xu F, Chen W. Screening of Gene Expression Markers for Corona Virus Disease 2019 Through Boruta_MCFS Feature Selection. Front Public Health 2022; 10:901602. [PMID: 35812497 PMCID: PMC9258782 DOI: 10.3389/fpubh.2022.901602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/17/2022] [Indexed: 11/25/2022] Open
Abstract
Since the first report of SARS-CoV-2 virus in Wuhan, China in December 2019, a global outbreak of Corona Virus Disease 2019 (COVID-19) pandemic has been aroused. In the prevention of this disease, accurate diagnosis of COVID-19 is the center of the problem. However, due to the limitation of detection technology, the test results are impossible to be totally free from pseudo-positive or -negative. Improving the precision of the test results asks for the identification of more biomarkers for COVID-19. On the basis of the expression data of COVID-19 positive and negative samples, we first screened the feature genes through ReliefF, minimal-redundancy-maximum-relevancy, and Boruta_MCFS methods. Thereafter, 36 optimal feature genes were selected through incremental feature selection method based on the random forest classifier, and the enriched biological functions and signaling pathways were revealed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Also, protein-protein interaction network analysis was performed on these feature genes, and the enriched biological functions and signaling pathways of main submodules were analyzed. In addition, whether these 36 feature genes could effectively distinguish positive samples from the negative ones was verified by dimensionality reduction analysis. According to the results, we inferred that the 36 feature genes selected via Boruta_MCFS could be deemed as biomarkers in COVID-19.
Collapse
Affiliation(s)
- Yanbao Sun
- Department of Radiology, Affiliated Hospital of Jiaxing University, Jiaxing, China
| | - Qi Zhang
- Department of Respiration in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
| | - Qi Yang
- Department of Respiration in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
| | - Ming Yao
- Center for Pain Medicine in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
| | - Fang Xu
- The Xiuzhou Kang'an Hospital of Jiaxing, Jiaxing, China
| | - Wenyu Chen
- Department of Respiration in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
- *Correspondence: Wenyu Chen
| |
Collapse
|
9
|
Chu Y, Guo S, Cui D, Fu X, Ma Y. DeephageTP: a convolutional neural network framework for identifying phage-specific proteins from metagenomic sequencing data. PeerJ 2022; 10:e13404. [PMID: 35698617 PMCID: PMC9188312 DOI: 10.7717/peerj.13404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/18/2022] [Indexed: 01/14/2023] Open
Abstract
Bacteriophages (phages) are the most abundant and diverse biological entity on Earth. Due to the lack of universal gene markers and database representatives, there about 50-90% of genes of phages are unable to assign functions. This makes it a challenge to identify phage genomes and annotate functions of phage genes efficiently by homology search on a large scale, especially for newly phages. Portal (portal protein), TerL (large terminase subunit protein), and TerS (small terminase subunit protein) are three specific proteins of Caudovirales phage. Here, we developed a CNN (convolutional neural network)-based framework, DeephageTP, to identify the three specific proteins from metagenomic data. The framework takes one-hot encoding data of original protein sequences as the input and automatically extracts predictive features in the process of modeling. To overcome the false positive problem, a cutoff-loss-value strategy is introduced based on the distributions of the loss values of protein sequences within the same category. The proposed model with a set of cutoff-loss-values demonstrates high performance in terms of Precision in identifying TerL and Portal sequences (94% and 90%, respectively) from the mimic metagenomic dataset. Finally, we tested the efficacy of the framework using three real metagenomic datasets, and the results shown that compared to the conventional alignment-based methods, our proposed framework had a particular advantage in identifying the novel phage-specific protein sequences of portal and TerL with remote homology to their counterparts in the training datasets. In summary, our study for the first time develops a CNN-based framework for identifying the phage-specific protein sequences with high complexity and low conservation, and this framework will help us find novel phages in metagenomic sequencing data. The DeephageTP is available at https://github.com/chuym726/DeephageTP.
Collapse
Affiliation(s)
- Yunmeng Chu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China,Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, Fujian, P.R. China
| | - Shun Guo
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Dachao Cui
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Xiongfei Fu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Yingfei Ma
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| |
Collapse
|
10
|
Ma Z, Zhu T, Wang H, Wang B, Fu L, Yu G. Investigation of serum markers of esophageal squamous cell carcinoma based on machine learning methods. J Biochem 2022; 172:29-36. [PMID: 35415740 DOI: 10.1093/jb/mvac030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/19/2022] [Indexed: 11/15/2022] Open
Abstract
Esophageal squamous cell carcinoma (ESCC) is one of the malignant tumors with high mortality in humans, and there is a lack of effective and convenient early diagnosis methods. By analyzing the serum miRNA expression data in ESCC tumor samples and normal samples, on the basis of the max-relevance and min-redundancy (mRMR) feature selection and the incremental feature selection (IFS) method, a random forest classifier constructed by 5 feature miRNAs was acquired in our study. The receiver operator characteristic (ROC) curve showed that the model was able to distinguish samples. Principal component analysis (PCA) and sample hierarchical cluster analysis showed that 5 feature miRNAs could well distinguish ESCC patients from healthy individuals. The expression levels of miR-663a, miR-5100 and miR-221-3p all showed a higher expression level in ESCC patients than those in healthy individuals. On the contrary, miR-6763-5p and miR-7111-5p both showed lower expression levels in ESCC patients than those in healthy individuals. In addition, the collected clinical serum samples were used for qRT-PCR analysis. It was uncovered that the expression trends of the 5 feature miRNAs followed a similar pattern with those in the training set. The above findings indicated that the 5 feature miRNAs may be serum tumor markers of ESCC. This study offers new insights for the early diagnosis of ESCC.
Collapse
Affiliation(s)
- Zhifeng Ma
- Department of Thoracic Surgery, Shaoxing People's Hospital, Shaoxing, Zhejiang Province, China, 312000
| | - Ting Zhu
- Department of Thoracic Surgery, Shaoxing People's Hospital, Shaoxing, Zhejiang Province, China, 312000
| | - Haiyong Wang
- Department of Thoracic Surgery, Shaoxing People's Hospital, Shaoxing, Zhejiang Province, China, 312000
| | - Bin Wang
- Department of Thoracic Surgery, Shaoxing People's Hospital, Shaoxing, Zhejiang Province, China, 312000
| | - Linhai Fu
- Department of Thoracic Surgery, Shaoxing People's Hospital, Shaoxing, Zhejiang Province, China, 312000
| | - Guangmao Yu
- Department of Thoracic Surgery, Shaoxing People's Hospital, Shaoxing, Zhejiang Province, China, 312000
| |
Collapse
|
11
|
Ahmad S, Charoenkwan P, Quinn JMW, Moni MA, Hasan MM, Lio' P, Shoombuatong W. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Sci Rep 2022; 12:4106. [PMID: 35260777 PMCID: PMC8904530 DOI: 10.1038/s41598-022-08173-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/03/2022] [Indexed: 12/30/2022] Open
Abstract
Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
Collapse
Affiliation(s)
- Saeed Ahmad
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Faculty of Health and Behavioural Sciences, School of Health and Rehabilitation Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Md Mehedi Hasan
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
12
|
Wessolly M, Mairinger E, Borchert S, Bankfalvi A, Mach P, Schmid KW, Kimmig R, Buderath P, Mairinger FD. CAF-Associated Paracrine Signaling Worsens Outcome and Potentially Contributes to Chemoresistance in Epithelial Ovarian Cancer. Front Oncol 2022; 12:798680. [PMID: 35311102 PMCID: PMC8927667 DOI: 10.3389/fonc.2022.798680] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 02/07/2022] [Indexed: 01/06/2023] Open
Abstract
Background High-grade serous ovarian cancer (HGSOC) is the predominant and deadliest form of ovarian cancer. Some of its histological subtypes can be distinguished by frequent occurrence of cancer-associated myofibroblasts (CAFs) and desmoplastic stroma reaction (DSR). In this study, we want to explore the relationship between therapy outcome and the activity of CAF-associated signaling pathways in a homogeneous HGSOC patient collective. Furthermore, we want to validate these findings in a general Epithelial ovarian cancer (EOC) cohort. Methods The investigation cohort consists of 24 HGSOC patients. All of them were treated with platinum-based components and clinical follow-up was available. The validation cohort was comprised of 303 patients. Sequencing data (whole transcriptome) and clinical data were extracted from The Cancer Genome Atlas (TCGA). RNA of HGSOC patients was isolated using a Maxwell RSC instrument and the appropriate RNA isolation kit. For digital expression analysis a custom-designed gene panel was employed. All genes were linked to various DSR- and CAF- associated pathways. Expression analysis was performed on the NanoString nCounter platform. Finally, data were explored using the R programming environment (v. 4.0.3). Result In total, 15 CAF-associated genes were associated with patients’ survival. More specifically, 6 genes (MMP13, CGA, EPHA3, PSMD9, PITX2, PHLPP1) were linked to poor therapy outcome. Though a variety of different pathways appeared to be associated with therapy failure, many were related to CAF paracrine signaling, including MAPK, Ras and TGF-β pathways. Similar results were obtained from the validation cohort. Discussion In this study, we could successfully link CAF-associated pathways, as shown by increased Ras, MAPK and PI3K-Akt signaling to therapy failure (chemotherapy) in HGSOC and EOCs in general. As platinum-based chemotherapy has been the state-of-the-art therapy to treat HGSOC for decades, it is necessary to unveil the reasons behind resistance developments and poor outcome. In this work, CAF-associated signaling is shown to compromise therapy response. In the validation cohort, CAF-associated signaling is also associated with therapy failure in general EOC, possibly hinting towards a conserved mechanism. Therefore, it may be helpful to stratify HGSOC patients for CAF activity and consider alternative treatment options.
Collapse
Affiliation(s)
- Michael Wessolly
- Institute of Pathology, University Hospital Essen, Essen, Germany
- *Correspondence: Michael Wessolly,
| | - Elena Mairinger
- Institute of Pathology, University Hospital Essen, Essen, Germany
| | - Sabrina Borchert
- Institute of Pathology, University Hospital Essen, Essen, Germany
| | - Agnes Bankfalvi
- Institute of Pathology, University Hospital Essen, Essen, Germany
| | - Pawel Mach
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| | | | - Rainer Kimmig
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| | - Paul Buderath
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| | | |
Collapse
|
13
|
Kabir M, Nantasenamat C, Kanthawong S, Charoenkwan P, Shoombuatong W. Large-scale comparative review and assessment of computational methods for phage virion proteins identification. EXCLI JOURNAL 2022; 21:11-29. [PMID: 35145365 PMCID: PMC8822302 DOI: 10.17179/excli2021-4411] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/29/2021] [Indexed: 12/11/2022]
Abstract
Phage virion proteins (PVPs) are effective at recognizing and binding to host cell receptors while having no deleterious effects on human or animal cells. Understanding their functional mechanisms is regarded as a critical goal that will aid in rational antibacterial drug discovery and development. Although high-throughput experimental methods for identifying PVPs are considered the gold standard for exploring crucial PVP features, these procedures are frequently time-consuming and labor-intensive. Thusfar, more than ten sequence-based predictors have been established for the in silico identification of PVPs in conjunction with traditional experimental approaches. As a result, a revised and more thorough assessment is extremely desirable. With this purpose in mind, we first conduct a thorough survey and evaluation of a vast array of 13 state-of-the-art PVP predictors. Among these PVP predictors, they can be classified into three groups according to the types of machine learning (ML) algorithms employed (i.e. traditional ML-based methods, ensemble-based methods and deep learning-based methods). Subsequently, we explored which factors are important for building more accurate and stable predictors and this included training/independent datasets, feature encoding algorithms, feature selection methods, core algorithms, performance evaluation metrics/strategies and web servers. Finally, we provide insights and future perspectives for the design and development of new and more effective computational approaches for the detection and characterization of PVPs.
Collapse
Affiliation(s)
- Muhammad Kabir
- School of Systems and Technology, Department of Computer Science, University of Management and Technology, Lahore, Pakistan, 54770
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand, 40002
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
14
|
Zou H. Identifying blood‐brain barrier peptides by using amino acids physicochemical properties and features fusion method. Pept Sci (Hoboken) 2021. [DOI: 10.1002/pep2.24247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang China
| |
Collapse
|
15
|
Zou H, Yang F, Yin Z. Identifying N7-methylguanosine sites by integrating multiple features. Biopolymers 2021; 113:e23480. [PMID: 34709657 DOI: 10.1002/bip.23480] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 11/10/2022]
Abstract
Recent studies reported that N7-methylguanosine (m7G) plays a vital role in gene expression regulation. As a consequence, determining the distribution of m7G is a crucial step towards further understanding its biological functions. Although biological experimental approaches are capable of accurately locating m7G sites, they are labor-intensive, costly, and time-consuming. Therefore, it is necessary to develop more effective and robust computational methods to replace, or at least complement current experimental methods. In this study, we developed a novel sequence-based computational tool to identify RNA m7G sites. In this model, 22 kinds of dinucleotide physicochemical (PC) properties were employed to encode the RNA sequence. Three types of descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were adopted to extract effective features from the PC matrix. The least absolute shrinkage and selection operator (LASSO) algorithm was utilized to reduce the influence of irrelevant or redundant features. Finally, these selected features were fed into a support vector machine (SVM) for distinguishing m7G from non-m7G sites. The proposed method significantly outperforms existing predictors across all evaluation metrics. It indicates that the approach is effective in identifying RNA m7G sites.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
16
|
Identification of Novel COVID-19 Biomarkers by Multiple Feature Selection Strategies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:2203636. [PMID: 34603483 PMCID: PMC8485143 DOI: 10.1155/2021/2203636] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 08/20/2021] [Indexed: 12/13/2022]
Abstract
Coronavirus disease 2019 (COVID-19) arising from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in a global pandemic since its first report in December 2019. So far, SARS-CoV-2 nucleic acid detection has been deemed as the golden standard of COVID-19 diagnosis. However, this detection method often leads to false negatives, thus triggering missed COVID-19 diagnosis. Therefore, it is urgent to find new biomarkers to increase the accuracy of COVID-19 diagnosis. To explore new biomarkers of COVID-19 in this study, expression profiles were firstly accessed from the GEO database. On this basis, 500 feature genes were screened by the minimum-redundancy maximum-relevancy (mRMR) feature selection method. Afterwards, the incremental feature selection (IFS) method was used to choose a classifier with the best performance from different feature gene-based support vector machine (SVM) classifiers. The corresponding 66 feature genes were set as the optimal feature genes. Lastly, the optimal feature genes were subjected to GO functional enrichment analysis, principal component analysis (PCA), and protein-protein interaction (PPI) network analysis. All in all, it was posited that the 66 feature genes could effectively classify positive and negative COVID-19 and work as new biomarkers of the disease.
Collapse
|
17
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
18
|
Ramos-Vivas J, Elexpuru-Zabaleta M, Samano ML, Barrera AP, Forbes-Hernández TY, Giampieri F, Battino M. Phages and Enzybiotics in Food Biopreservation. Molecules 2021; 26:molecules26175138. [PMID: 34500572 PMCID: PMC8433972 DOI: 10.3390/molecules26175138] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/10/2021] [Accepted: 08/20/2021] [Indexed: 12/27/2022] Open
Abstract
Presently, biopreservation through protective bacterial cultures and their antimicrobial products or using antibacterial compounds derived from plants are proposed as feasible strategies to maintain the long shelf-life of products. Another emerging category of food biopreservatives are bacteriophages or their antibacterial enzymes called "phage lysins" or "enzybiotics", which can be used directly as antibacterial agents due to their ability to act on the membranes of bacteria and destroy them. Bacteriophages are an alternative to antimicrobials in the fight against bacteria, mainly because they have a practically unique host range that gives them great specificity. In addition to their potential ability to specifically control strains of pathogenic bacteria, their use does not generate a negative environmental impact as in the case of antibiotics. Both phages and their enzymes can favor a reduction in antibiotic use, which is desirable given the alarming increase in resistance to antibiotics used not only in human medicine but also in veterinary medicine, agriculture, and in general all processes of manufacturing, preservation, and distribution of food. We present here an overview of the scientific background of phages and enzybiotics in the food industry, as well as food applications of these biopreservatives.
Collapse
Affiliation(s)
- José Ramos-Vivas
- Research Group on Foods, Nutritional Biochemistry and Health, Universidad Europea del Atlántico, 39011 Santander, Spain; (J.R.-V.); (M.E.-Z.); (M.L.S.)
- Department of Project Management, Universidad Internacional Iberoamericana, Campeche 24560, Mexico;
| | - María Elexpuru-Zabaleta
- Research Group on Foods, Nutritional Biochemistry and Health, Universidad Europea del Atlántico, 39011 Santander, Spain; (J.R.-V.); (M.E.-Z.); (M.L.S.)
| | - María Luisa Samano
- Research Group on Foods, Nutritional Biochemistry and Health, Universidad Europea del Atlántico, 39011 Santander, Spain; (J.R.-V.); (M.E.-Z.); (M.L.S.)
- Department of Project Management, Universidad Internacional Iberoamericana, Campeche 24560, Mexico;
| | - Alina Pascual Barrera
- Department of Project Management, Universidad Internacional Iberoamericana, Campeche 24560, Mexico;
| | | | - Francesca Giampieri
- Department of Clinical Sciences, Polytechnic University of Marche, 60131 Ancona, Italy
- Department of Biochemistry, Faculty of Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Correspondence: (F.G.); (M.B.); Tel.: +339-071-220-4136 (F.G.); +339-071-220-4646 (M.B.)
| | - Maurizio Battino
- Department of Clinical Sciences, Polytechnic University of Marche, 60131 Ancona, Italy
- International Research Center for Food Nutrition and Safety, Jiangsu University, Zhenjiang 212013, China
- Correspondence: (F.G.); (M.B.); Tel.: +339-071-220-4136 (F.G.); +339-071-220-4646 (M.B.)
| |
Collapse
|
19
|
iPVP-MCV: A Multi-Classifier Voting Model for the Accurate Identification of Phage Virion Proteins. Symmetry (Basel) 2021. [DOI: 10.3390/sym13081506] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The classic structure of a bacteriophage is commonly characterized by complex symmetry. The head of the structure features icosahedral symmetry, whereas the tail features helical symmetry. The phage virion protein (PVP), a type of bacteriophage structural protein, is an essential material of the infectious viral particles and is responsible for multiple biological functions. Accurate identification of PVPs is of great significance for comprehending the interaction between phages and host bacteria and developing new antimicrobial drugs or antibiotics. However, traditional experimental approaches for identifying PVPs are often time-consuming and laborious. Therefore, the development of computational methods that can efficiently and accurately identify PVPs is desired. In this study, we proposed a multi-classifier voting model called iPVP-MCV to enhance the predictive performance of PVPs based on their amino acid sequences. First, three types of evolutionary features were extracted from the position-specific scoring matrix (PSSM) profiles to represent PVPs and non-PVPs. Then, a set of baseline models were trained based on the support vector machine (SVM) algorithm combined with each type of feature descriptors. Finally, the outputs of these baseline models were integrated to construct the proposed method iPVP-MCV by using the majority voting strategy. Our results demonstrated that the proposed iPVP-MCV model was superior to existing methods when performing the rigorous independent dataset test.
Collapse
|
20
|
Nami Y, Imeni N, Panahi B. Application of machine learning in bacteriophage research. BMC Microbiol 2021; 21:193. [PMID: 34174831 PMCID: PMC8235560 DOI: 10.1186/s12866-021-02256-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 06/08/2021] [Indexed: 12/20/2022] Open
Abstract
Phages are one of the key components in the structure, dynamics, and interactions of microbial communities in different bins. It has a clear impact on human health and the food industry. Bacteriophage characterization using in vitro approaches are time/cost consuming and laborious tasks. On the other hand, with the advent of new high-throughput sequencing technology, the development of a powerful computational framework to characterize the newly identified bacteriophages is inevitable for future research. Machine learning includes powerful techniques that enable the analysis of complex datasets for knowledge discovery and pattern recognition. In this study, we have conducted a comprehensive review of machine learning methods application using different types of features were applied in various aspects of bacteriophage research including, automated curation, identification, classification, host species recognition, virion protein identification, and life cycle prediction. Moreover, potential limitations and advantages of the developed frameworks were discussed.
Collapse
Affiliation(s)
- Yousef Nami
- Department of Food Biotechnology, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Nazila Imeni
- Young Researchers and Elite Clube, Marand Branch, Islamic Azad University, Marand, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
21
|
Ao C, Zou Q, Yu L. RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features. Methods 2021; 203:32-39. [PMID: 34033879 DOI: 10.1016/j.ymeth.2021.05.016] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 05/04/2021] [Accepted: 05/20/2021] [Indexed: 12/31/2022] Open
Abstract
N2-methylguanosine is a post-transcriptional modification of RNA that is found in eukaryotes and archaea. The biological function of m2G modification discovered so far is to control and stabilize the three-dimensional structure of tRNA and the dynamic barrier of reverse transcription. To discover additional biological functions of m2G, it is necessary to develop time-saving and labor-saving calculation tools to identify m2G. In this paper, based on hybrid features and a random forest, a novel predictor, RFhy-m2G, was developed to identify the m2G modification sites for three species. The hybrid feature used by the predictor is used to fuse the three features of ENAC, PseDNC, and NPPS. These three features include primary sequence derivation properties, physicochemical properties, and position-specific properties. Since there are redundant features in hybrid features, MRMD2.0 is used for optimal feature selection. Through feature analysis, it is found that the optimal hybrid features obtained still contain three kinds of properties, and the hybrid features can more accurately identify m2G modification sites and improve prediction performance. Based on five-fold cross-validation and independent testing to evaluate the prediction model, the accuracies obtained were 0.9982 and 0.9417, respectively. The robustness of the predictor is demonstrated by comparisons with other predictors.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|
22
|
Jing XY, Li FM. Predicting Cell Wall Lytic Enzymes Using Combined Features. Front Bioeng Biotechnol 2021; 8:627335. [PMID: 33585423 PMCID: PMC7874139 DOI: 10.3389/fbioe.2020.627335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/04/2020] [Indexed: 11/13/2022] Open
Abstract
Due to the overuse of antibiotics, people are worried that existing antibiotics will become ineffective against pathogens with the rapid rise of antibiotic-resistant strains. The use of cell wall lytic enzymes to destroy bacteria has become a viable alternative to avoid the crisis of antimicrobial resistance. In this paper, an improved method for cell wall lytic enzymes prediction was proposed and the amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the auto-covariance average chemical shift (acACS) were selected to predict the cell wall lytic enzymes with support vector machine (SVM). In order to overcome the imbalanced data classification problems and remove redundant or irrelevant features, the synthetic minority over-sampling technique (SMOTE) was used to balance the dataset. The F-score was used to select features. The Sn, Sp, MCC, and Acc were 99.35%, 99.02%, 0.98, and 99.19% with jackknife test using the optimized combination feature AAC+DC+acACS+PSSM-AC. The Sn, Sp, MCC, and Acc of cell wall lytic enzymes in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.
Collapse
Affiliation(s)
- Xiao-Yang Jing
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Feng-Min Li
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
23
|
Fang Z, Zhou H. VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids. Front Microbiol 2021; 12:615711. [PMID: 33613485 PMCID: PMC7894196 DOI: 10.3389/fmicb.2021.615711] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/04/2021] [Indexed: 01/22/2023] Open
Abstract
Viruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searching the current database; therefore, the development of an alignment-free algorithm for functional annotation of prokaryote virus proteins is important to understand the viral community. The identification of prokaryote virus proteins (PVVPs) is a critical step for many viral analyses, such as species classification, phylogenetic analysis and the exploration of how prokaryote virus interact with their hosts. Although a series of PVVP prediction tools have been developed, the performance of these tools is still not satisfactory. Moreover, viral metagenomic data contains fragmented sequences, leading to the existence of some incomplete genes. Therefore, a tool that can identify partial prokaryote virus proteins is also needed. In this work, we present a novel algorithm, called VirionFinder, to identify the complete and partial PVVPs from non-prokaryote virus virion proteins (non-PVVPs). VirionFinder uses the sequence and biochemical properties of 20 amino acids as the mathematical model to encode the protein sequences and uses a deep learning technique to identify whether a given protein is a PVVP. Compared with the state-of-the-art tools using artificial benchmark datasets, the results show that under the same specificity (Sp), the sensitivity (Sn) of VirionFinder is approximately 10-34% much higher than the Sn of these tools on both complete and partial proteins. When evaluating related tools using real virome data, the recognition rate of PVVP-like sequences of VirionFinder is also much higher than that of the other tools. We expect that VirionFinder will be a powerful tool for identifying novel virion proteins from both complete prokaryote virus genomes and viral metagenomic data. VirionFinder is freely available at https://github.com/zhenchengfang/VirionFinder.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Southern Medical University, Guangzhou, China
| |
Collapse
|
24
|
Meng C, Wu J, Guo F, Dong B, Xu L. CWLy-pred: A novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method. Genomics 2020; 112:4715-4721. [DOI: 10.1016/j.ygeno.2020.08.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 08/04/2020] [Accepted: 08/13/2020] [Indexed: 10/25/2022]
|
25
|
Chen W, Nie F, Ding H. Recent Advances of Computational Methods for Identifying Bacteriophage Virion Proteins. Protein Pept Lett 2020; 27:259-264. [PMID: 30968770 DOI: 10.2174/0929866526666190410124642] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 03/07/2019] [Accepted: 04/01/2019] [Indexed: 01/09/2023]
Abstract
Phage Virion Proteins (PVP) are essential materials of bacteriophage, which participate in a series of biological processes. Accurate identification of phage virion proteins is helpful to understand the mechanism of interaction between the phage and its host bacteria. Since experimental method is labor intensive and time-consuming, in the past few years, many computational approaches have been proposed to identify phage virion proteins. In order to facilitate researchers to select appropriate methods, it is necessary to give a comprehensive review and comparison on existing computational methods on identifying phage virion proteins. In this review, we summarized the existing computational methods for identifying phage virion proteins and also assessed their performances on an independent dataset. Finally, challenges and future perspectives for identifying phage virion proteins were presented. Taken together, we hope that this review could provide clues to researches on the study of phage virion proteins.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Fulei Nie
- Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
26
|
Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 2020; 34:1105-1116. [DOI: 10.1007/s10822-020-00323-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 06/10/2020] [Indexed: 12/11/2022]
|
27
|
Feature selection for hierarchical classification via joint semantic and structural information of labels. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
28
|
Meng C, Zhang J, Ye X, Guo F, Zou Q. Review and comparative analysis of machine learning-based phage virion protein identification methods. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140406. [PMID: 32135196 DOI: 10.1016/j.bbapap.2020.140406] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/14/2020] [Accepted: 02/27/2020] [Indexed: 02/01/2023]
Abstract
Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Science City, Japan
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
29
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020; 9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand;
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| |
Collapse
|
30
|
Basith S, Manavalan B, Shin TH, Lee G. SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:131-141. [PMID: 31542696 PMCID: PMC6796762 DOI: 10.1016/j.omtn.2019.08.011] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 07/30/2019] [Accepted: 08/08/2019] [Indexed: 12/19/2022]
Abstract
DNA N6-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can identify 6mA sites but show limited prediction accuracy, which limits their usability in epigenetic research. Here, we developed a novel computational predictor, called the Sequence-based DNA N6-methyladenine predictor (SDM6A), which is a two-layer ensemble approach for identifying 6mA sites in the rice genome. Unlike existing methods, which are based on single models with basic features, SDM6A explores various features, and five encoding methods were identified as appropriate for this problem. Subsequently, an optimal feature set was identified from encodings, and corresponding models were developed individually using support vector machine and extremely randomized tree. First, all five single models were integrated via ensemble approach to define the class for each classifier. Second, two classifiers were integrated to generate a final prediction. SDM6A achieved robust performance on cross-validation and independent evaluation, with average accuracy and Matthews correlation coefficient (MCC) of 88.2% and 0.764, respectively. Corresponding metrics were 4.7%-11.0% and 2.3%-5.5% higher than those of existing methods, respectively. A user-friendly, publicly accessible web server (http://thegleelab.org/SDM6A) was implemented to predict novel putative 6mA sites in rice genome.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | | | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.
| |
Collapse
|
31
|
Mairinger F, Bankfalvi A, Schmid KW, Mairinger E, Mach P, Walter RF, Borchert S, Kasimir-Bauer S, Kimmig R, Buderath P. Digital Immune-Related Gene Expression Signatures In High-Grade Serous Ovarian Carcinoma: Developing Prediction Models For Platinum Response. Cancer Manag Res 2019; 11:9571-9583. [PMID: 31814759 PMCID: PMC6858803 DOI: 10.2147/cmar.s219872] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 09/28/2019] [Indexed: 12/18/2022] Open
Abstract
Purpose Response to platinum-based therapy is a major prognostic factor in high-grade serous ovarian cancer (HGSOC). While the exact mechanisms of platinum-resistance remain unclear, evidence is accumulating for a connection between the organism’s immune-response and response to platinum. However, predictive tools are missing. This study was performed to examine the putative role of the genetic tumor immune-microenvironment in mediating differential chemotherapy response in HGSOC patients. Patients and methods Expression profiling of 770 immune-related genes was performed in tumor tissues from 23 HGSOC cases. Tumors were screened for prognostic and predictive biomarkers using the NanoString nCounter platform for digital gene expression analysis with the appurtenant PanCancer Immune Profiling panel. As validation cohort, gene expression data (RNA Seq) of 303 patients with epithelial ovarian carcinoma (EOC) were retrieved from the The Cancer Genome Atlas (TCGA) database. Different scoring-systems were computed for prediction of risk-of-resistance to cisplatin, disease-free survival (DFS) and overall survival (OS). Results Validated on the TCGA-dataset, the developed scores identified 11 significantly differentially expressed genes (p <0.01**) associated with platinum response. HSD11B1 was highly significantly associated with lower risk of recurrence and 7 targets were found highly significantly influencing OS time (p <0.01**). Conclusion Our results suggest that response to platinum-based therapy and DFS in ovarian HGSOC is associated with distinct gene-expression patterns related to the tumor immune-system. We generated predictive scoring systems which proved valid when applied to a set of 303 EOC patients.
Collapse
Affiliation(s)
- Fabian Mairinger
- Institute for Pathology, University Hospital Essen, Essen, Germany
| | - Agnes Bankfalvi
- Institute for Pathology, University Hospital Essen, Essen, Germany
| | | | - Elena Mairinger
- Ruhrlandklinik, West German Lung Centre, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Pawel Mach
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| | - Robert Fh Walter
- Ruhrlandklinik, West German Lung Centre, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Sabrina Borchert
- Institute for Pathology, University Hospital Essen, Essen, Germany
| | - Sabine Kasimir-Bauer
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| | - Rainer Kimmig
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| | - Paul Buderath
- Department of Gynecology and Obstetrics, University Hospital Essen, Essen, Germany
| |
Collapse
|
32
|
Le NQK. Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles. J Proteome Res 2019. [DOI: 10.1021/acs.jproteome.9b00411 10.1021/acs.jproteome.9b00411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798
| |
Collapse
|
33
|
Deaton J, Yu FB, Quake SR. Mini-Metagenomics and Nucleotide Composition Aid the Identification and Host Association of Novel Bacteriophage Sequences. ACTA ACUST UNITED AC 2019; 3:e1900108. [PMID: 32648690 DOI: 10.1002/adbi.201900108] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Revised: 07/10/2019] [Indexed: 11/07/2022]
Abstract
A broad spectrum of metagenomic and single cell sequencing techniques have become popular for dissecting environmental microbial diversity, leading to the characterization of thousands of novel microbial lineages. In addition to recovering bacterial and archaeal genomes, metagenomic assembly can also produce genomes of viruses that infect microbial cells. Because of their diversity, lack of marker genes, and small genome size, identifying novel bacteriophage sequences from metagenomic data is often challenging, especially when the objective is to establish phage-host relationships. The present work describes a computational approach that uses supervised learning to classify metagenomic contigs as phage or non-phage as well as assigning phage taxonomy based on tetranucleotide frequencies. Furthermore, the method assigns phage-host relationships using co-occurrence statistics derived from a recently developed mini-metagenomic experimental technique. This work evaluates method performance at identifying viral contigs and predicting taxonomic classification using publicly available references. Then, using two mini-metagenomic datasets, over 100 novel phage contigs from hot spring samples of Yellowstone National Park are identified and assigned to putative microbial hosts. Results of this work demonstrate the value of combining viral sequence identification with mini-metagenomic experimental methods to understand the microbial ecosystem.
Collapse
Affiliation(s)
- Jonathan Deaton
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA
| | - Feiqiao Brian Yu
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA.,Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA, 94158, USA
| | - Stephen R Quake
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA.,Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA, 94158, USA
| |
Collapse
|
34
|
Le NQK. Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles. J Proteome Res 2019; 18:3503-3511. [DOI: 10.1021/acs.jproteome.9b00411] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798
| |
Collapse
|
35
|
AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019; 17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open
Abstract
Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions. We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree. AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets. We constructed a user-friendly web server that implements the proposed AtbPpred method.
Collapse
|
36
|
Wei HH, Yang W, Tang H, Lin H. The Development of Machine Learning Methods in Cell-Penetrating Peptides Identification: A Brief Review. Curr Drug Metab 2019; 20:217-223. [DOI: 10.2174/1389200219666181010114750] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 11/22/2022]
Abstract
Background:Cell-penetrating Peptides (CPPs) are important short peptides that facilitate cellular intake or uptake of various molecules. CPPs can transport drug molecules through the plasma membrane and send these molecules to different cellular organelles. Thus, CPP identification and related mechanisms have been extensively explored. In order to reveal the penetration mechanisms of a large number of CPPs, it is necessary to develop convenient and fast methods for CPPs identification.Methods:Biochemical experiments can provide precise details for accurately identifying CPP, but these methods are expensive and laborious. To overcome these disadvantages, several computational methods have been developed to identify CPPs. We have performed review on the development of machine learning methods in CPP identification. This review provides an insight into CPP identification.Results:We summarized the machine learning-based CPP identification methods and compared the construction strategies of 11 different computational methods. Furthermore, we pointed out the limitations and difficulties in predicting CPPs.Conclusion:In this review, the last studies on CPP identification using machine learning method were reported. We also discussed the future development direction of CPP recognition with computational methods.
Collapse
Affiliation(s)
- Huan-Huan Wei
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wuritu Yang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
37
|
mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int J Mol Sci 2019; 20:ijms20081964. [PMID: 31013619 PMCID: PMC6514805 DOI: 10.3390/ijms20081964] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 04/08/2019] [Accepted: 04/18/2019] [Indexed: 12/24/2022] Open
Abstract
Anticancer peptides (ACPs) are promising therapeutic agents for targeting and killing cancer cells. The accurate prediction of ACPs from given peptide sequences remains as an open problem in the field of immunoinformatics. Recently, machine learning algorithms have emerged as a promising tool for helping experimental scientists predict ACPs. However, the performance of existing methods still needs to be improved. In this study, we present a novel approach for the accurate prediction of ACPs, which involves the following two steps: (i) We applied a two-step feature selection protocol on seven feature encodings that cover various aspects of sequence information (composition-based, physicochemical properties and profiles) and obtained their corresponding optimal feature-based models. The resultant predicted probabilities of ACPs were further utilized as feature vectors. (ii) The predicted probability feature vectors were in turn used as an input to support vector machine to develop the final prediction model called mACPpred. Cross-validation analysis showed that the proposed predictor performs significantly better than individual feature encodings. Furthermore, mACPpred significantly outperformed the existing methods compared in this study when objectively evaluated on an independent dataset.
Collapse
|
38
|
Wang X, Li H, Gao P, Liu Y, Zeng W. Combining Support Vector Machine with Dual g-gap Dipeptides to Discriminate between Acidic and Alkaline Enzymes. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180925125912] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The catalytic activity of the enzyme is different from that of the inorganic catalyst. In a high-temperature, over-acid or over-alkaline environment, the structure of the enzyme is destroyed and then loses its activity. Although the biochemistry experiments can measure the optimal PH environment of the enzyme, these methods are inefficient and costly. In order to solve these problems, computational model could be established to determine the optimal acidic or alkaline environment of the enzyme. Firstly, in this paper, we introduced a new feature called dual g-gap dipeptide composition to formulate enzyme samples. Subsequently, the best feature was selected by using the F value calculated from analysis of variance. Finally, support vector machine was utilized to build prediction model for distinguishing acidic from alkaline enzyme. The overall accuracy of 95.9% was achieved with Jackknife cross-validation, which indicates that our method is professional and efficient in terms of acid and alkaline enzyme predictions. The feature proposed in this paper could also be applied in other fields of bioinformatics.
Collapse
Affiliation(s)
- Xianfang Wang
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Hongfei Li
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Wenjing Zeng
- TianJiabing Middle School of Chengdu, Chengdu 610011, China
| |
Collapse
|
39
|
An Automatic HEp-2 Specimen Analysis System Based on an Active Contours Model and an SVM Classification. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9020307] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The antinuclear antibody (ANA) test is widely used for screening, diagnosing, and monitoring of autoimmune diseases. The most common methods to determine ANA are indirect immunofluorescence (IIF), performed by human epithelial type 2 (HEp-2) cells, as substrate antigen. The evaluation of ANA consist an analysis of fluorescence intensity and staining patterns. This paper presents a complete and fully automatic system able to characterize IIF images. The fluorescence intensity classification was obtained by performing an image preprocessing phase and implementing a Support Vector Machines (SVM) classifier. The cells identification problem has been addressed by developing a flexible segmentation methods, based on the Hough transform for ellipses, and on an active contours model. In order to classify the HEp-2 cells, six SVM and one k-nearest neighbors (KNN)classifiers were developed. The system was tested on a public database consisting of 2080 IIF images. Unlike almost all work presented on this topic, the proposed system automatically addresses all phases of the HEp-2 image analysis process. All results have been evaluated by comparing them with some of the most representative state-of-the-art work, demonstrating the goodness of the system in the characterization of HEp-2 images.
Collapse
|