1
|
Classical and Deep Learning Paradigms for Detection and Validation of Key Genes of Risky Outcomes of HCV. ALGORITHMS 2020. [DOI: 10.3390/a13030073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Hepatitis C virus (HCV) is one of the most dangerous viruses worldwide. It is the foremost cause of the hepatic cirrhosis, and hepatocellular carcinoma, HCC. Detecting new key genes that play a role in the growth of HCC in HCV patients using machine learning techniques paves the way for producing accurate antivirals. In this work, there are two phases: detecting the up/downregulated genes using classical univariate and multivariate feature selection methods, and validating the retrieved list of genes using Insilico classifiers. However, the classification algorithms in the medical domain frequently suffer from a deficiency of training cases. Therefore, a deep neural network approach is proposed here to validate the significance of the retrieved genes in classifying the HCV-infected samples from the disinfected ones. The validation model is based on the artificial generation of new examples from the retrieved genes’ expressions using sparse autoencoders. Subsequently, the generated genes’ expressions data are used to train conventional classifiers. Our results in the first phase yielded a better retrieval of significant genes using Principal Component Analysis (PCA), a multivariate approach. The retrieved list of genes using PCA had a higher number of HCC biomarkers compared to the ones retrieved from the univariate methods. In the second phase, the classification accuracy can reveal the relevance of the extracted key genes in classifying the HCV-infected and disinfected samples.
Collapse
|
2
|
Belciug S. Logistic regression paradigm for training a single-hidden layer feedforward neural network. Application to gene expression datasets for cancer research. J Biomed Inform 2019; 102:103373. [PMID: 31901506 DOI: 10.1016/j.jbi.2019.103373] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 12/27/2019] [Accepted: 12/30/2019] [Indexed: 02/06/2023]
Abstract
OBJECTIVE The speed of the diagnosis process is vital in pursuing the trial of curing cancer. During the last decade, precision medicine evolved by detecting different types of cancer through microarrays (MA) of deoxyribonucleic acid (DNA) processed by machine learning (ML) algorithms. Personalized diagnosis, followed by personalized treatment, should imply personalized hyperparameters of the ML. The goal of this paper is to propose a novel adaptive ML method that embeds knowledge into the architecture of the algorithm and also filters the features in order to reduce their number, increase computational speed, and decrease computational cost and time. MATERIALS AND METHODS fLogSLFN is a novel two-fold theoretically effective ML that can be used in two-class decision problems that embeds the logistic regression in such a manner that the hidden nodes of a single-hidden layer feedforward neural network (SLFN) are problem dependent. A filtering module based on the significance of each attribute is embedded in order to avoid the 'curse of dimensionality' phenomenon. The proposed model has been tested on three publicly available high-dimensional cancer datasets that contain gene expressions provided by complementary DNA (cDNA) array, and DNA microarray. The proposed novel method filtered logistic SLFN (fLogSLFN) has been also compared and statistically benchmarked to four ML algorithms: extreme learning machine (ELM), radial basis function network (RBF), single-hidden layer feedforward neural network trained by the backpropagation algorithm (BPNN), logistic regression with the LASSO penalty, and the adaptive single-hidden layer feedforward network (aSLFN). MAIN FINDINGS The experimental results showed that the fLogSLFN is competitive to the other state-of-the-art models, obtaining accuracies between 64.70% and 98.66% depending on the dataset it had been applied on. CONCLUSIONS In contrast to other state-of-the-art ML algorithms, the fLogSLFN is capable to embed the knowledge extracted from the data into its architecture, making it problem dependent. The filtering module increases its computational speed, while decreasing computational cost and time. The statistical analysis revealed the fact that by filtering the features the performance is kept, making the algorithm more efficient.
Collapse
Affiliation(s)
- Smaranda Belciug
- Department of Computer Science, Faculty of Sciences, University of Craiova, Craiova 200585, Romania.
| |
Collapse
|
3
|
Yarbakht M, Nikkhah M, Moshaii A, Weber K, Matthäus C, Cialla-May D, Popp J. Simultaneous isolation and detection of single breast cancer cells using surface-enhanced Raman spectroscopy. Talanta 2018; 186:44-52. [PMID: 29784385 DOI: 10.1016/j.talanta.2018.04.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 04/02/2018] [Accepted: 04/03/2018] [Indexed: 02/07/2023]
Abstract
Nowadays, cancer is one of the most dangerous and deadly disease all around the world. Cancer that is diagnosed at early stages is more likely to be treated successfully. Treatment of progressed cancer is very difficult, and generally surviving rates are much lower. Therefore, much research has been focused on developing non-invasive methods for detection of cancer and monitoring of its progress. Within this contribution, we present a novel strategy for selective isolation and detection of breast cancer cell lines (MCF-7 and BT-20) based on surface enhanced Raman scattering (SERS). A simplified protocol based on cell-aptamer interaction has been developed in which core-shell (Au@Fe3O4) nanoparticles (CSNs) were functionalized with a mucin 1 (MUC1) specific aptamer (Apt1) to capture cells through the interaction between Apt1 and overexpressed protein (MUC1) on the surface of the tumor cells. Meanwhile, a SERS nano-tag, synthesized by the conjugation of Apt1 to the surface of BSA coated and with 4-mercaptopyridine (4-Mpy) functionalized gold nanoparticles, was used to detect the isolated cells. As a conclusion, the proposed strategy can be extended to isolate and detect cells more precisely based on the detection of different kinds of biomarkers on the surface of cancer cells, simultaneously.
Collapse
Affiliation(s)
- Melina Yarbakht
- Department of Nanobiotechnology, Tarbiat Modares University, P.O. Box 14115-175, Tehran, Iran
| | - Maryam Nikkhah
- Department of Nanobiotechnology, Tarbiat Modares University, P.O. Box 14115-175, Tehran, Iran.
| | - Ahmad Moshaii
- Department of Physics, Tarbiat Modares University, P.O Box 14115-175, Tehran, Iran
| | - Karina Weber
- Leibniz Institute of Photonic Technology (IPHT), Albert-Einstein-Str. 9, 07745 Jena, Germany; Friedrich-Schiller University, Institute of Physical Chemistry and Abbe Center of Photonics, Helmholtzweg 4, Jena 07743, Germany
| | - Christian Matthäus
- Leibniz Institute of Photonic Technology (IPHT), Albert-Einstein-Str. 9, 07745 Jena, Germany; Friedrich-Schiller University, Institute of Physical Chemistry and Abbe Center of Photonics, Helmholtzweg 4, Jena 07743, Germany
| | - Dana Cialla-May
- Leibniz Institute of Photonic Technology (IPHT), Albert-Einstein-Str. 9, 07745 Jena, Germany; Friedrich-Schiller University, Institute of Physical Chemistry and Abbe Center of Photonics, Helmholtzweg 4, Jena 07743, Germany.
| | - Jürgen Popp
- Leibniz Institute of Photonic Technology (IPHT), Albert-Einstein-Str. 9, 07745 Jena, Germany; Friedrich-Schiller University, Institute of Physical Chemistry and Abbe Center of Photonics, Helmholtzweg 4, Jena 07743, Germany
| |
Collapse
|
4
|
Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2018; 2018:5490513. [PMID: 29666661 PMCID: PMC5831962 DOI: 10.1155/2018/5490513] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/17/2017] [Accepted: 12/21/2017] [Indexed: 11/17/2022]
Abstract
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.
Collapse
|
5
|
Moteghaed NY, Maghooli K, Garshasbi M. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine. JOURNAL OF MEDICAL SIGNALS AND SENSORS 2018; 8. [PMID: 29535919 PMCID: PMC5840891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
BACKGROUND Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. METHODS The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. RESULTS Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. CONCLUSIONS The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface.
Collapse
Affiliation(s)
- Niloofar Yousefi Moteghaed
- Department of Biomedical Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran
| | - Keivan Maghooli
- Department of Biomedical Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran,Address for correspondence: Dr. Keivan Maghooli, Department of Biomedical Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran. E-mail:
| | - Masoud Garshasbi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
6
|
Wang C, Gevertz JL. Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches. Stat Appl Genet Mol Biol 2017; 15:321-47. [PMID: 27226102 DOI: 10.1515/sagmb-2015-0072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal - we call these causative genes. We found that not one statistical method tested could identify all the causative genes for all of the simulated cancers, even though increasing the sample size does improve the variable selection capabilities in most cases. Furthermore, certain statistical tools can classify our simulated data with a low error rate, yet the variables being used for classification are not necessarily the causative genes.
Collapse
|
7
|
Shi SH, Zhang W, Jiang J, Sun L. Identification of altered pathways in breast cancer based on individualized pathway aberrance score. Oncol Lett 2017; 14:1287-1294. [PMID: 28789343 PMCID: PMC5529805 DOI: 10.3892/ol.2017.6292] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 12/20/2016] [Indexed: 11/06/2022] Open
Abstract
The objective of the present study was to identify altered pathways in breast cancer based on the individualized pathway aberrance score (iPAS) method combined with the normal reference (nRef). There were 4 steps to identify altered pathways using the iPAS method: Data preprocessing conducted by the robust multi-array average (RMA) algorithm; gene-level statistics based on average Z; pathway-level statistics according to iPAS; and a significance test dependent on 1 sample Wilcoxon test. The altered pathways were validated by calculating the changed percentage of each pathway in tumor samples and comparing them with pathways from differentially expressed genes (DEGs). A total of 688 altered pathways with P<0.01 were identified, including kinesin (KIF)- and polo-like kinase (PLK)-mediated events. When the percentage of change reached 50%, 310 pathways were involved in the total 688 altered pathways, which may validate the present results. In addition, there were 324 DEGs and 155 common genes between DEGs and pathway genes. DEGs and common genes were enriched in the same 9 significant terms, which also were members of altered pathways. The iPAS method was suitable for identifying altered pathways in breast cancer. Altered pathways (such as KIF and PLK mediated events) were important for understanding breast cancer mechanisms and for the future application of customized therapeutic decisions.
Collapse
Affiliation(s)
- Sheng-Hong Shi
- Department of Breast Surgery, Ningbo No. 2 Hospital, Ningbo, Zhejiang 315000, P.R. China
| | - Wei Zhang
- Department of Breast Surgery, Ningbo No. 2 Hospital, Ningbo, Zhejiang 315000, P.R. China
| | - Jing Jiang
- Department of Breast Surgery, Ningbo No. 2 Hospital, Ningbo, Zhejiang 315000, P.R. China
| | - Long Sun
- Department of Breast Surgery, Ningbo No. 2 Hospital, Ningbo, Zhejiang 315000, P.R. China
| |
Collapse
|
8
|
Urda D, Luque-Baena RM, Franco L, Jerez JM, Sanchez-Marono N. Machine learning models to search relevant genetic signatures in clinical context. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) 2017:1649-1656. [DOI: 10.1109/ijcnn.2017.7966049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
9
|
Lee H, Shin M. Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data. BioData Min 2017; 10:3. [PMID: 28168005 PMCID: PMC5286825 DOI: 10.1186/s13040-017-0127-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 01/26/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. RESULTS The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. CONCLUSIONS Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.
Collapse
Affiliation(s)
- Hyeonjeong Lee
- Bio-Intelligence & Data Mining Laboratory, Graduate School of Electronics Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu, 41566 Republic of Korea
| | - Miyoung Shin
- School of Electronics Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu, 41566 Republic of Korea
| |
Collapse
|
10
|
Liu J, Hua P, Hui L, Zhang LL, Hu Z, Zhu YW. Identification of hub genes and pathways associated with hepatocellular carcinoma based on network strategy. Exp Ther Med 2016; 12:2109-2119. [PMID: 27703495 PMCID: PMC5039750 DOI: 10.3892/etm.2016.3599] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 07/05/2016] [Indexed: 12/11/2022] Open
Abstract
The objective of this study was to identify hub genes and pathways associated with hepatocellular carcinoma (HCC) by centrality analysis of a co-expression network. A co-expression network based on differentially expressed (DE) genes of HCC was constructed using the Differentially Co-expressed Genes and Links (DCGL) package. Centrality analyses, for centrality of degree, clustering coefficient, closeness, stress and betweenness for the co-expression network were performed to identify hub genes, and the hub genes were combined together to overcome inconsistent results. Enrichment analyses were conducted using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases. Finally, validation of hub genes was conducted utilizing reverse transcription-polymerase chain reaction (RT-PCR) analysis. In total, 260 DE genes between normal controls and HCC patients were obtained and a co-expression network with 154 nodes and 326 edges was constructed. From this, 13 hub genes were identified according to degree, clustering coefficient, closeness, stress and betweenness centrality analysis. It was found that reelin (RELN), potassium voltage-gated channel subfamily J member 10 (KCNJ10) and neural cell adhesion molecule 1 (NCAM1) were common hub genes across the five centralities, and the results of RT-PCR analysis for RELN, KCNJ10 and NCAM1 were consistent with the centrality analyses. Pathway enrichment analysis of DE genes showed that cell cycle, metabolism of xenobiotics by cytochrome P450 and p53 signaling pathway were the most significant pathways. This study may contribute to understanding the molecular pathogenesis of HCC and provide potential biomarkers for its early detection and effective therapies.
Collapse
Affiliation(s)
- Jun Liu
- Department of Radiology, Wuxi Second Hospital Affiliated to Nanjing Medical University, Wuxi, Jiangsu 214002, P.R. China
| | - Ping Hua
- Department of Internal Medicine, Wuxi Second Hospital Affiliated to Nanjing Medical University, Wuxi, Jiangsu 214002, P.R. China
| | - Li Hui
- Department of Internal Medicine, Wuxi Second Hospital Affiliated to Nanjing Medical University, Wuxi, Jiangsu 214002, P.R. China
| | - Li-Li Zhang
- Department of Internal Medicine, Wuxi Second Hospital Affiliated to Nanjing Medical University, Wuxi, Jiangsu 214002, P.R. China
| | - Zhen Hu
- Department of Internal Medicine, Wuxi Second Hospital Affiliated to Nanjing Medical University, Wuxi, Jiangsu 214002, P.R. China
| | - Ying-Wei Zhu
- Department of Internal Medicine, Wuxi Second Hospital Affiliated to Nanjing Medical University, Wuxi, Jiangsu 214002, P.R. China
| |
Collapse
|
11
|
Li RH, Zhang AM, Li S, Li TY, Wang LJ, Zhang HR, Li P, Jia XJ, Zhang T, Peng XY, Liu MD, Wang X, Lang Y, Xue WL, Liu J, Wang YY. Multiple differential expression networks identify key genes in rectal cancer. Cancer Biomark 2016; 16:435-44. [PMID: 27062700 DOI: 10.3233/cbm-160582] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND Rectal cancer is an important contributor to cancer mortality. OBJECTIVE The objective of this paper is to identify key genes across three phenotypes (fungating, polypoid and polypoid & small-ulcer) of rectal cancer based on multiple differential expression networks (DENs). METHODS Differential interactions and non-differential interactions were evaluated according to Spearman correlation coefficient (SCC) algorithm, and were selected to construct DENs. Topological analysis was performed for exploring hub genes in largest components of DENs. Key genes were denoted as intersections between nodes of DENs and rectal cancer associated genes from Genecards. Finally, we utilized hub genes to classify phenotypes of rectal cancer on the basis of support vector machines (SVM) methodology. RESULTS We obtained 19 hub genes and total 12 common key genes of three largest components of DENs, and EGFR was the common element. The SVM results revealed that hub genes could classify phenotypes, and validated feasibility of DEN methods. CONCLUSIONS We have successfully identified significant genes (such as EGFR and UBC) across fungating, polypoid and polypoid & small-ulcer phenotype of rectal cancer. They might be potential biomarkers for classification, detection and therapy of this cancer.
Collapse
Affiliation(s)
- Ri-Heng Li
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Ai-Min Zhang
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Shuang Li
- Department of Blood Transfusion, Neimenggu Xinganleague People's Hospital, Wulanhaote, Inner Mongolia, China
| | - Tian-Yang Li
- Clinical Medical College of Hebei University, Baoding, Hebei, China
| | - Lian-Jing Wang
- Clinical Medical College of Hebei University, Baoding, Hebei, China
| | - Hao-Ran Zhang
- Clinical Medical College of Hebei University, Baoding, Hebei, China
| | - Ping Li
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Xiong-Jie Jia
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Tao Zhang
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Xin-Yu Peng
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Min-Di Liu
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Xu Wang
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Yan Lang
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Wei-Lan Xue
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Jing Liu
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| | - Yan-Yan Wang
- Department of Gastrointestinal Surgery, Affiliated Hospital of Hebei University, Baoding, Hebei, China
| |
Collapse
|
12
|
Wang YZ, Qiu SC. Prediction of key genes in ovarian cancer treated with decitabine based on network strategy. Oncol Rep 2016; 35:3548-58. [PMID: 27035425 DOI: 10.3892/or.2016.4697] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 01/26/2016] [Indexed: 11/06/2022] Open
Abstract
The objective of the present study was to predict key genes in ovarian cancer before and after treatment with decitabine utilizing a network approach and to reveal the molecular mechanism. Pathogenic networks of ovarian cancer before and after treatment were identified based on known pathogenic genes (seed genes) and differentially expressed genes (DEGs) detected by Significance Analysis of Microarrays (SAM) method. A weight was assigned to each gene in the pathogenic network and then candidate genes were evaluated. Topological properties (degree, betweenness, closeness and stress) of candidate genes were analyzed to investigate more confident pathogenic genes. Pathway enrichment analysis for candidate and seed genes were conducted. Validation of candidate gene expression in ovarian cancer was performed by reverse transcriptase-polymerase chain reaction (RT-PCR) assays. There were 73 nodes and 147 interactions in the pathogenic network before treatment, while 47 nodes and 66 interactions after treatment. A total of 32 candidate genes were identified in the before treatment group of ovarian cancer, of which 16 were rightly candidate genes after treatment and the others were silenced. We obtained 5 key genes (PIK3R2, CCNB1, IL2, IL1B and CDC6) for decitabine treatment that were validated by RT-PCR. In conclusion, we successfully identified 5 key genes (PIK3R2, CCNB1, IL2, IL1B and CDC6) and validated them, which provides insight into the molecular mechanisms of decitabine treatment and may be potential pathogenic biomarkers for the therapy of ovarian cancer.
Collapse
Affiliation(s)
- Yu-Zhen Wang
- Department of Pharmacy, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang 310016, P.R. China
| | - Sheng-Chun Qiu
- Department of Nursing, Zhejiang Provincial People's Hospital, Xiacheng, Hangzhou, Zhejiang 310014, P.R. China
| |
Collapse
|
13
|
Sui S, Wang X, Zheng H, Guo H, Chen T, Ji DM. Gene set enrichment and topological analyses based on interaction networks in pediatric acute lymphoblastic leukemia. Oncol Lett 2016; 10:3354-3362. [PMID: 26788135 PMCID: PMC4665311 DOI: 10.3892/ol.2015.3761] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 07/16/2015] [Indexed: 01/23/2023] Open
Abstract
Pediatric acute lymphoblastic leukemia (ALL) accounts for over one-quarter of all pediatric cancers. Interacting genes and proteins within the larger human gene interaction network of the human genome are rarely investigated by studies investigating pediatric ALL. In the present study, interaction networks were constructed using the empirical Bayesian approach and the Search Tool for the Retrieval of Interacting Genes/proteins database, based on the differentially-expressed (DE) genes in pediatric ALL, which were identified using the RankProd package. Enrichment analysis of the interaction network was performed using the network-based methods EnrichNet and PathExpand, which were compared with the traditional expression analysis systematic explored (EASE) method. In total, 398 DE genes were identified in pediatric ALL, and LIF was the most significantly DE gene. The co-expression network consisted of 272 nodes, which indicated genes and proteins, and 602 edges, which indicated the number of interactions adjacent to the node. Comparison between EASE and PathExpand revealed that PathExpand detected more pathways or processes that were closely associated with pediatric ALL compared with the EASE method. There were 294 nodes and 1,588 edges in the protein-protein interaction network, with the processes of hematopoietic cell lineage and porphyrin metabolism demonstrating a close association with pediatric ALL. Network enrichment analysis based on the PathExpand algorithm was revealed to be more powerful for the analysis of interaction networks in pediatric ALL compared with the EASE method. LIF and MLLT11 were identified as the most significantly DE genes in pediatric ALL. The process of hematopoietic cell lineage was the pathway most significantly associated with pediatric ALL.
Collapse
Affiliation(s)
- Shuxiang Sui
- Department of Pediatrics, Shandong Dongying People's Hospital, Dongying, Shandong 257091, P.R. China
| | - Xin Wang
- Department of Pediatrics, Shandong Dongying People's Hospital, Dongying, Shandong 257091, P.R. China
| | - Hua Zheng
- Department of Pediatrics, Shandong Dongying People's Hospital, Dongying, Shandong 257091, P.R. China
| | - Hua Guo
- Department of Pediatrics, Shandong Dongying People's Hospital, Dongying, Shandong 257091, P.R. China
| | - Tong Chen
- Department of Pediatrics, Shandong Dongying People's Hospital, Dongying, Shandong 257091, P.R. China
| | - Dong-Mei Ji
- Department of Pediatrics, Shandong Dongying People's Hospital, Dongying, Shandong 257091, P.R. China
| |
Collapse
|
14
|
Nguyen T, Khosravi A, Creighton D, Nahavandi S. Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification. PLoS One 2015; 10:e0120364. [PMID: 25823003 PMCID: PMC4378968 DOI: 10.1371/journal.pone.0120364] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 02/08/2015] [Indexed: 11/19/2022] Open
Abstract
This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.
Collapse
Affiliation(s)
- Thanh Nguyen
- Centre for Intelligent Systems Research (CISR), Deakin University, Geelong Waurn Ponds Campus, Victoria, 3216, Australia
- * E-mail:
| | - Abbas Khosravi
- Centre for Intelligent Systems Research (CISR), Deakin University, Geelong Waurn Ponds Campus, Victoria, 3216, Australia
| | - Douglas Creighton
- Centre for Intelligent Systems Research (CISR), Deakin University, Geelong Waurn Ponds Campus, Victoria, 3216, Australia
| | - Saeid Nahavandi
- Centre for Intelligent Systems Research (CISR), Deakin University, Geelong Waurn Ponds Campus, Victoria, 3216, Australia
| |
Collapse
|
15
|
Raedler D, Ballenberger N, Klucker E, Böck A, Otto R, Prazeres da Costa O, Holst O, Illig T, Buch T, von Mutius E, Schaub B. Identification of novel immune phenotypes for allergic and nonallergic childhood asthma. J Allergy Clin Immunol 2015; 135:81-91. [DOI: 10.1016/j.jaci.2014.07.046] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 07/16/2014] [Accepted: 07/22/2014] [Indexed: 10/24/2022]
|
16
|
Moteghaed NY, Maghooli K, Pirhadi S, Garshasbi M. Biomarker Discovery Based on Hybrid Optimization Algorithm and Artificial Neural Networks on Microarray Data for Cancer Classification. JOURNAL OF MEDICAL SIGNALS & SENSORS 2015; 5:88-96. [PMID: 26120567 PMCID: PMC4460670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 02/03/2015] [Indexed: 11/06/2022]
Abstract
The improvement of high-through-put gene profiling based microarrays technology has provided monitoring the expression value of thousands of genes simultaneously. Detailed examination of changes in expression levels of genes can help physicians to have efficient diagnosing, classification of tumors and cancer's types as well as effective treatments. Finding genes that can classify the group of cancers correctly based on hybrid optimization algorithms is the main purpose of this paper. In this paper, a hybrid particle swarm optimization and genetic algorithm method are used for gene selection and also artificial neural network (ANN) is adopted as the classifier. In this work, we have improved the ability of the algorithm for the classification problem by finding small group of biomarkers and also best parameters of the classifier. The proposed approach is tested on three benchmark gene expression data sets: Blood (acute myeloid leukemia, acute lymphoblastic leukemia), colon and breast datasets. We used 10-fold cross-validation to achieve accuracy and also decision tree algorithm to find the relation between the biomarkers for biological point of view. To test the ability of the trained ANN models to categorize the cancers, we analyzed additional blinded samples that were not previously used for the training procedure. Experimental results show that the proposed method can reduce the dimension of the data set and confirm the most informative gene subset and improve classification accuracy with best parameters based on datasets.
Collapse
Affiliation(s)
- Niloofar Yousefi Moteghaed
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Keivan Maghooli
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran,Address for correspondence: Dr. Keivan Maghooli, Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran. E-mail:
| | - Shiva Pirhadi
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Masoud Garshasbi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
17
|
A comparative analysis of swarm intelligence techniques for feature selection in cancer classification. ScientificWorldJournal 2014; 2014:693831. [PMID: 25157377 PMCID: PMC4137534 DOI: 10.1155/2014/693831] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 06/18/2014] [Indexed: 11/17/2022] Open
Abstract
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
Collapse
|
18
|
Han F, Sun W, Ling QH. A novel strategy for gene selection of microarray data based on gene-to-class sensitivity information. PLoS One 2014; 9:e97530. [PMID: 24844313 PMCID: PMC4028211 DOI: 10.1371/journal.pone.0097530] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Accepted: 04/21/2014] [Indexed: 11/19/2022] Open
Abstract
To obtain predictive genes with lower redundancy and better interpretability, a hybrid gene selection method encoding prior information is proposed in this paper. To begin with, the prior information referred to as gene-to-class sensitivity (GCS) of all genes from microarray data is exploited by a single hidden layered feedforward neural network (SLFN). Then, to select more representative and lower redundant genes, all genes are grouped into some clusters by K-means method, and some low sensitive genes are filtered out according to their GCS values. Finally, a modified binary particle swarm optimization (BPSO) encoding the GCS information is proposed to perform further gene selection from the remainder genes. For considering the GCS information, the proposed method selects those genes highly correlated to sample classes. Thus, the low redundant gene subsets obtained by the proposed method also contribute to improve classification accuracy on microarray data. The experiments results on some open microarray data verify the effectiveness and efficiency of the proposed approach.
Collapse
Affiliation(s)
- Fei Han
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
| | - Wei Sun
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
| | - Qing-Hua Ling
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, China
| |
Collapse
|
19
|
Wang X. Identification of Marker Genes for Cancer Based on Microarrays Using a Computational Biology Approach. Curr Bioinform 2014; 9:140-146. [PMID: 24683388 DOI: 10.2174/1574893608999140109115649] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Rapid advances in gene expression microarray technology have enabled to discover molecular markers used for cancer diagnosis, prognosis, and prediction. One computational challenge with using microarray data analysis to create cancer classifiers is how to effectively deal with microarray data which are composed of high-dimensional attributes (p) and low-dimensional instances (n). Gene selection and classifier construction are two key issues concerned with this topics. In this article, we reviewed major methods for computational identification of cancer marker genes. We concluded that simple methods should be preferred to complicated ones for their interpretability and applicability.
Collapse
Affiliation(s)
- Xiaosheng Wang
- Biometric Research Branch, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, U.S.A
| |
Collapse
|
20
|
Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers. BIOMED RESEARCH INTERNATIONAL 2013; 2013:239628. [PMID: 24078908 PMCID: PMC3770038 DOI: 10.1155/2013/239628] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2013] [Revised: 07/08/2013] [Accepted: 07/17/2013] [Indexed: 11/24/2022]
Abstract
DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
Collapse
|
21
|
Giugno R, Pulvirenti A, Cascione L, Pigola G, Ferro A. MIDClass: microarray data classification by association rules and gene expression intervals. PLoS One 2013; 8:e69873. [PMID: 23936357 PMCID: PMC3735555 DOI: 10.1371/journal.pone.0069873] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/13/2013] [Indexed: 11/18/2022] Open
Abstract
We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier), based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.
Collapse
Affiliation(s)
- Rosalba Giugno
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
- * E-mail: (RG); (AP)
| | - Alfredo Pulvirenti
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
- * E-mail: (RG); (AP)
| | - Luciano Cascione
- Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, United States of America
| | - Giuseppe Pigola
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
| |
Collapse
|
22
|
Mao Z, Cai W, Shao X. Selecting significant genes by randomization test for cancer classification using gene expression data. J Biomed Inform 2013; 46:594-601. [DOI: 10.1016/j.jbi.2013.03.009] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Revised: 01/30/2013] [Accepted: 03/28/2013] [Indexed: 12/30/2022]
|
23
|
Improved shrunken centroid classifiers for high-dimensional class-imbalanced data. BMC Bioinformatics 2013; 14:64. [PMID: 23433084 PMCID: PMC3687811 DOI: 10.1186/1471-2105-14-64] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 01/31/2013] [Indexed: 11/21/2022] Open
Abstract
Background PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate. Results We show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means). Conclusions The results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data.
Collapse
|
24
|
Interval-valued analysis for discriminative gene selection and tissue sample classification using microarray data. Genomics 2013; 101:38-48. [DOI: 10.1016/j.ygeno.2012.09.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 09/08/2012] [Accepted: 09/10/2012] [Indexed: 11/18/2022]
|
25
|
Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SAFT. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 2012; 14:315-26. [PMID: 22786785 PMCID: PMC3659301 DOI: 10.1093/bib/bbs034] [Citation(s) in RCA: 220] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
Collapse
|
26
|
Jurman G, Riccadonna S, Visintainer R, Furlanello C. Algebraic comparison of partial lists in bioinformatics. PLoS One 2012; 7:e36540. [PMID: 22615778 PMCID: PMC3355159 DOI: 10.1371/journal.pone.0036540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 04/06/2012] [Indexed: 12/20/2022] Open
Abstract
The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or to a meta-analysis comparison, it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained, instead of just one list. Here we introduce a method, based on permutations, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated by finding and comparing gene profiles on a large prostate cancer dataset, consisting of two cohorts of patients from different countries, for a total of 455 samples.
Collapse
|
27
|
|
28
|
Robust two-gene classifiers for cancer prediction. Genomics 2011; 99:90-5. [PMID: 22138042 DOI: 10.1016/j.ygeno.2011.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 11/04/2011] [Accepted: 11/09/2011] [Indexed: 11/23/2022]
Abstract
Two-gene classifiers have attracted a broad interest for their simplicity and practicality. Most existing two-gene classification algorithms were involved in exhaustive search that led to their low time-efficiencies. In this study, we proposed two new two-gene classification algorithms which used simple univariate gene selection strategy and constructed simple classification rules based on optimal cut-points for two genes selected. We detected the optimal cut-point with the information entropy principle. We applied the two-gene classification models to eleven cancer gene expression datasets and compared their classification performance to that of some established two-gene classification models like the top-scoring pairs model and the greedy pairs model, as well as standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. These comparisons indicated that the performance of our two-gene classifiers was comparable to or better than that of compared models.
Collapse
|