1
|
Li Q, Zhou W, Wang D, Wang S, Li Q. Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model. Front Bioeng Biotechnol 2020; 8:892. [PMID: 32903381 PMCID: PMC7434836 DOI: 10.3389/fbioe.2020.00892] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/10/2020] [Indexed: 01/09/2023] Open
Abstract
Cancer is still a severe health problem globally. The therapy of cancer traditionally involves the use of radiotherapy or anticancer drugs to kill cancer cells, but these methods are quite expensive and have side effects, which will cause great harm to patients. With the find of anticancer peptides (ACPs), significant progress has been achieved in the therapy of tumors. Therefore, it is invaluable to accurately identify anticancer peptides. Although biochemical experiments can solve this work, this method is expensive and time-consuming. To promote the application of anticancer peptides in cancer therapy, machine learning can be used to recognize anticancer peptides by extracting the feature vectors of anticancer peptides. Nevertheless, poor performance usually be found in training the machine learning model to utilizing high-dimensional features in practice. In order to solve the above job, this paper put forward a 19-dimensional feature model based on anticancer peptide sequences, which has lower dimensionality and better performance than some existing methods. In addition, this paper also separated a model with a low number of dimensions and acceptable performance. The few features identified in this study may represent the important features of anticancer peptides.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Sui Wang
- Key Laboratory of Soybean Biology in Chinese Ministry of Education, Northeast Agricultural University, Harbin, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
2
|
Wang C, Zhang Y, Han S. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]
Abstract
Fungi play essential roles in many ecological processes, and taxonomic classification is fundamental for microbial community characterization and vital for the study and preservation of fungal biodiversity. To cope with massive fungal barcode data, tools that can implement extensive volumes of barcode sequences, especially the internal transcribed spacer (ITS) region, are necessary. However, high variation in the ITS region and computational requirements for processing high-dimensional features remain challenging for existing predictors. In this study, we developed Its2vec, a bioinformatics tool for the classification of fungal ITS barcodes to the species level. An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled. For dimensionality reduction, a word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector. A random forest-based classifier was built for species identification. Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality. We expect the Its2vec model to be helpful for fungal species identification and, thus, for revealing microbial community structures and in deepening our understanding of their functional mechanisms.
Collapse
Affiliation(s)
- Chao Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 60054, China
| |
Collapse
|
3
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
4
|
Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N 6-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020; 11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]
Abstract
MOTIVATION The biological function of N 6-methyladenine DNA (6mA) in plants is largely unknown. Rice is one of the most important crops worldwide and is a model species for molecular and genetic studies. There are few methods for 6mA site recognition in the rice genome, and an effective computational method is needed. RESULTS In this paper, we propose a new computational method called 6mA-Pred to identify 6mA sites in the rice genome. 6mA-Pred employs a feature fusion method to combine advantageous features from other methods and thus obtain a new feature to identify 6mA sites. This method achieved an accuracy of 87.27% in the identification of 6mA sites with 10-fold cross-validation and achieved an accuracy of 85.6% in independent test sets.
Collapse
Affiliation(s)
- Qianfei Huang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
5
|
Ouyang J, Sun Y, Li W, Zhang W, Wang D, Liu X, Lin Y, Lian B, Xie L. dbPHCC: a database of prognostic biomarkers for hepatocellular carcinoma that provides online prognostic modeling. Biochim Biophys Acta Gen Subj 2016; 1860:2688-95. [PMID: 26940364 DOI: 10.1016/j.bbagen.2016.02.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 01/27/2016] [Accepted: 02/26/2016] [Indexed: 12/12/2022]
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is one of the most common malignant cancers with a poor prognosis. For decades, more and more biomarkers were found to effect on HCC prognosis, but these studies were scattered and there were no unified identifiers. Therefore, we built the database of prognostic biomarkers and models for hepatocellular carcinoma (dbPHCC). METHODS dbPHCC focuses on biomarkers which were related to HCC prognosis by traditional experiments rather than high-throughput technology. All of the prognostic biomarkers came from literatures issued during 2002 to 2014 in PubMed and were manually selected. dbPHCC collects comprehensive information of candidate biomarkers and HCC prognosis. RESULTS dbPHCC mainly contains 567 biomarkers: 323 proteins, 154 genes, and 90 microRNAs. For each biomarker, the reference information, experimental conditions, and prognostic information are shown. Based on two available patient cohort data sets, an exemplified prognostic model was constructed using 15 phosphotransferases in dbPHCC. The web interface does not only provide a full range of browsing and searching, but also provides online analysis tools. dbPHCC is available at http://lifecenter.sgst.cn/dbphcc/ CONCLUSIONS dbPHCC provides a comprehensive and convenient search and analysis platform for HCC prognosis research. GENERAL SIGNIFICANCE dbPHCC is the first database to focus on experimentally verified individual biomarkers, which are related to HCC prognosis. Prognostic markers in dbPHCC have the potential to be therapeutic drug targets and may help in designing new treatments to improve survival of HCC patients. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Collapse
Affiliation(s)
- Jian Ouyang
- Biomedical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Ying Sun
- Biomedical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
| | - Wei Li
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai 201203, China
| | - Wen Zhang
- Department of Cardiothoracic Surgery, The First Affiliated Hospital of People Libration Army General Hospital, Beijing 100048, China
| | - Dandan Wang
- Biomedical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Xiangqiong Liu
- Biomedical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yong Lin
- Biomedical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Baofeng Lian
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai 201203, China; Shanghai Jiaotong University Affiliated First People's Hospital, Shanghai 200240, China.
| | - Lu Xie
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai 201203, China.
| |
Collapse
|
6
|
Weiss A, Berndsen RH, Ding X, Ho CM, Dyson PJ, van den Bergh H, Griffioen AW, Nowak-Sliwinska P. A streamlined search technology for identification of synergistic drug combinations. Sci Rep 2015; 5:14508. [PMID: 26416286 PMCID: PMC4586442 DOI: 10.1038/srep14508] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 08/27/2015] [Indexed: 01/08/2023] Open
Abstract
A major key to improvement of cancer therapy is the combination of drugs. Mixing drugs that already exist on the market may offer an attractive alternative. Here we report on a new model-based streamlined feedback system control (s-FSC) method, based on a design of experiment approach, for rapidly finding optimal drug mixtures with minimal experimental effort. We tested combinations in an in vitro assay for the viability of a renal cell adenocarcinoma (RCC) cell line, 786-O. An iterative cycle of in vitro testing and s-FSC analysis was repeated a few times until an optimal low dose combination was reached. Starting with ten drugs that target parallel pathways known to play a role in the development and progression of RCC, we identified the best overall drug combination, being a mixture of four drugs (axitinib, erlotinib, dasatinib and AZD4547) at low doses, inhibiting 90% of cell viability. The removal of AZD4547 from the optimized drug combination resulted in 80% of cell viability inhibition, while still maintaining the synergistic interaction. These optimized drug combinations were significantly more potent than monotherapies of all individual drugs (p < 0.001, CI < 0.3).
Collapse
Affiliation(s)
- Andrea Weiss
- Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.,Angiogenesis Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, The Netherlands
| | - Robert H Berndsen
- Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Xianting Ding
- Med-X Research Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Chih-Ming Ho
- Department of Mechanical and Aerospace Engineering, University of California, Los Angeles, USA
| | - Paul J Dyson
- Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Hubert van den Bergh
- Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Arjan W Griffioen
- Angiogenesis Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, The Netherlands
| | - Patrycja Nowak-Sliwinska
- Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.,Angiogenesis Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, The Netherlands
| |
Collapse
|