Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang JG, Deng HW. Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics 2007;8:370. [PMID: 17915022 PMCID: PMC2089123 DOI: 10.1186/1471-2105-8-370] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Accepted: 10/03/2007] [Indexed: 11/10/2022] Open

For:	Zhang JG, Deng HW. Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics 2007;8:370. [PMID: 17915022 PMCID: PMC2089123 DOI: 10.1186/1471-2105-8-370] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Accepted: 10/03/2007] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Sagkrioti E, Biz GM, Takan I, Asfa S, Nikitaki Z, Zanni V, Kars RH, Hellweg CE, Azzam EI, Logotheti S, Pavlopoulou A, Georgakilas AG. Radiation Type- and Dose-Specific Transcriptional Responses across Healthy and Diseased Mammalian Tissues. Antioxidants (Basel) 2022;11:2286. [PMID: 36421472 PMCID: PMC9687520 DOI: 10.3390/antiox11112286] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/12/2022] [Accepted: 11/15/2022] [Indexed: 08/30/2023] Open

Abstract

Ionizing radiation (IR) is a genuine genotoxic agent and a major modality in cancer treatment. IR disrupts DNA sequences and exerts mutagenic and/or cytotoxic properties that not only alter critical cellular functions but also impact tissues proximal and distal to the irradiated site. Unveiling the molecular events governing the diverse effects of IR at the cellular and organismal levels is relevant for both radiotherapy and radiation protection. Herein, we address changes in the expression of mammalian genes induced after the exposure of a wide range of tissues to various radiation types with distinct biophysical characteristics. First, we constructed a publicly available database, termed RadBioBase, which will be updated at regular intervals. RadBioBase includes comprehensive transcriptomes of mammalian cells across healthy and diseased tissues that respond to a range of radiation types and doses. Pertinent information was derived from a hybrid analysis based on stringent literature mining and transcriptomic studies. An integrative bioinformatics methodology, including functional enrichment analysis and machine learning techniques, was employed to unveil the characteristic biological pathways related to specific radiation types and their association with various diseases. We found that the effects of high linear energy transfer (LET) radiation on cell transcriptomes significantly differ from those caused by low LET and are consistent with immunomodulation, inflammation, oxidative stress responses and cell death. The transcriptome changes also depend on the dose since low doses up to 0.5 Gy are related with cytokine cascades, while higher doses with ROS metabolism. We additionally identified distinct gene signatures for different types of radiation. Overall, our data suggest that different radiation types and doses can trigger distinct trajectories of cell-intrinsic and cell-extrinsic pathways that hold promise to be manipulated toward improving radiotherapy efficiency and reducing systemic radiotoxicities.

Collapse

Affiliation(s)

Eftychia Sagkrioti DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780 Athens, Greece Biology Department, National and Kapodistrian University of Athens (NKUA), 15784 Athens, Greece
Gökay Mehmet Biz Department of Technical Programs, Izmir Vocational School, Dokuz Eylül University, Buca, Izmir 35380, Turkey
Işıl Takan Izmir Biomedicine and Genome Center (IBG), Balcova, Izmir 35340, Turkey Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35340, Turkey
Seyedehsadaf Asfa Izmir Biomedicine and Genome Center (IBG), Balcova, Izmir 35340, Turkey Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35340, Turkey
Zacharenia Nikitaki DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780 Athens, Greece
Vassiliki Zanni DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780 Athens, Greece
Rumeysa Hanife Kars Department of Biomedical Engineering, Istanbul Medipol University, Istanbul 34810, Turkey
Christine E. Hellweg German Aerospace Center (DLR), Institute of Aerospace Medicine, Radiation Biology, Linder Höhe, D-51147 Köln, Germany
Edouard I. Azzam Canadian Nuclear Laboratories, Chalk River, ON K0J 1J0, Canada
Stella Logotheti DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780 Athens, Greece
Athanasia Pavlopoulou Izmir Biomedicine and Genome Center (IBG), Balcova, Izmir 35340, Turkey Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35340, Turkey
Alexandros G. Georgakilas DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780 Athens, Greece

Collapse

Elitist random swapped particle swarm optimization embedded with variable k-nearest neighbour classification: a new PSO variant applied to gene identification. Soft comput 2022. [DOI: 10.1007/s00500-022-07515-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]

Jayanthi S, Rene Robin CR. Analysis of Microarray Data by Empirical Wavelet Transform for Cancer Classification Using Block by Block Method. JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS 2021. [DOI: 10.1166/jmihi.2021.3318] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Gupta M, Gupta B. A novel gene expression test method of minimizing breast cancer risk in reduced cost and time by improving SVM-RFE gene selection method combined with LASSO. J Integr Bioinform 2020;18:139-153. [PMID: 34171941 PMCID: PMC7856389 DOI: 10.1515/jib-2019-0110] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 11/12/2020] [Indexed: 01/26/2023] Open

Nguyen TTH, Nguyen PV, Tran QV, Vo NX, Vo TQ. Cancer classification from microarray data for genomic disorder research using optimal discriminant independent component analysis and kernel extreme learning machine. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2020;36:e3372. [PMID: 32453470 DOI: 10.1002/cnm.3372] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/08/2020] [Accepted: 05/13/2020] [Indexed: 06/11/2023]

Kilicarslan S, Adem K, Celik M. Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med Hypotheses 2020;137:109577. [DOI: 10.1016/j.mehy.2020.109577] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/04/2020] [Accepted: 01/16/2020] [Indexed: 10/25/2022]

A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure. BIOMED RESEARCH INTERNATIONAL 2019;2019:9864213. [PMID: 31828154 PMCID: PMC6885241 DOI: 10.1155/2019/9864213] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 08/10/2019] [Accepted: 08/27/2019] [Indexed: 12/11/2022]

Abstract

The identification of discriminative features from information-rich data with the goal of clinical diagnosis is crucial in the field of biomedical science. In this context, many machine-learning techniques have been widely applied and achieved remarkable results. However, disease, especially cancer, is often caused by a group of features with complex interactions. Unlike traditional feature selection methods, which only focused on finding single discriminative features, a multilayer feature subset selection method (MLFSSM), which employs randomized search and multilayer structure to select a discriminative subset, is proposed herein. In each level of this method, many feature subsets are generated to assure the diversity of the combinations, and the weights of features are evaluated on the performances of the subsets. The weight of a feature would increase if the feature is selected into more subsets with better performances compared with other features on the current layer. In this manner, the values of feature weights are revised layer-by-layer; the precision of feature weights is constantly improved; and better subsets are repeatedly constructed by the features with higher weights. Finally, the topmost feature subset of the last layer is returned. The experimental results based on five public gene datasets showed that the subsets selected by MLFSSM were more discriminative than the results by traditional feature methods including LVW (a feature subset method used the Las Vegas method for randomized search strategy), GAANN (a feature subset selection method based genetic algorithm (GA)), and support vector machine recursive feature elimination (SVM-RFE). Furthermore, MLFSSM showed higher classification performance than some state-of-the-art methods which selected feature pairs or groups, including top scoring pair (TSP), k-top scoring pairs (K-TSP), and relative simplicity-based direct classifier (RS-DC).

Collapse

Nagpal A, Singh V. Feature selection from high dimensional data based on iterative qualitative mutual information. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-181665] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Dif N, Elberrichi Z. An Enhanced Recursive Firefly Algorithm for Informative Gene Selection. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2019. [DOI: 10.4018/ijsir.2019040102] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yan Y, Dai T, Yang M, Du X, Zhang Y, Zhang Y. Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique. Int J Mol Sci 2018;19:ijms19113398. [PMID: 30380746 PMCID: PMC6274900 DOI: 10.3390/ijms19113398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 10/20/2018] [Accepted: 10/23/2018] [Indexed: 01/09/2023] Open

WiFi Indoor Localization with CSI Fingerprinting-Based Random Forest. SENSORS 2018;18:s18092869. [PMID: 30200285 PMCID: PMC6164737 DOI: 10.3390/s18092869] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 08/27/2018] [Accepted: 08/29/2018] [Indexed: 11/17/2022]

Anand D, Pandey B, Pandey DK. Facioscapulohumeral Muscular Dystrophy Diagnosis Using Hierarchical Clustering Algorithm and K-Nearest Neighbor Based Methodology. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2017. [DOI: 10.4018/ijehmc.2017040103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

A Novel Hybrid Feature Selection Model for Classification of Neuromuscular Dystrophies Using Bhattacharyya Coefficient, Genetic Algorithm and Radial Basis Function Based Support Vector Machine. Interdiscip Sci 2016;10:244-250. [PMID: 27637476 DOI: 10.1007/s12539-016-0183-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 08/07/2016] [Accepted: 08/30/2016] [Indexed: 10/21/2022]

Banerjee S, Anura A, Chakrabarty J, Sengupta S, Chatterjee J. Identification and functional assessment of novel gene sets towards better understanding of dysplasia associated oral carcinogenesis. GENE REPORTS 2016. [DOI: 10.1016/j.genrep.2016.04.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Moosa JM, Shakur R, Kaykobad M, Rahman MS. Gene selection for cancer classification with the help of bees. BMC Med Genomics 2016;9 Suppl 2:47. [PMID: 27510562 PMCID: PMC4980787 DOI: 10.1186/s12920-016-0204-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2023] Open

Mundra PA, Rajapakse JC. Gene and sample selection using T-score with sample selection. J Biomed Inform 2016;59:31-41. [DOI: 10.1016/j.jbi.2015.11.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Revised: 10/13/2015] [Accepted: 11/04/2015] [Indexed: 10/22/2022]

Johnson GR, Li J, Shariff A, Rohde GK, Murphy RF. Automated Learning of Subcellular Variation among Punctate Protein Patterns and a Generative Model of Their Relation to Microtubules. PLoS Comput Biol 2015;11:e1004614. [PMID: 26624011 PMCID: PMC4704559 DOI: 10.1371/journal.pcbi.1004614] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Accepted: 10/19/2015] [Indexed: 12/23/2022] Open

Hybrid Classification Techniques for Microarray Data. NATIONAL ACADEMY SCIENCE LETTERS-INDIA 2015. [DOI: 10.1007/s40009-015-0390-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Sachnev V, Saraswathi S, Niaz R, Kloczkowski A, Suresh S. Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer. BMC Bioinformatics 2015;16:166. [PMID: 25986937 PMCID: PMC4448565 DOI: 10.1186/s12859-015-0565-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 03/31/2015] [Indexed: 12/05/2022] Open

Abstract

Background

Traditional cancer treatments have centered on cytotoxic drugs and general purpose chemotherapy that may not be tailored to treat specific cancers. Identification of molecular markers that are related to different types of cancers might lead to discovery of drugs that are patient and disease specific. This study aims to use microarray gene expression cancer data to identify biomarkers that are indicative of different types of cancers. Our aim is to provide a multi-class cancer classifier that can simultaneously differentiate between cancers and identify type-specific biomarkers, through the application of the Binary Coded Genetic Algorithm (BCGA) and a neural network based Extreme Learning Machine (ELM) algorithm.

Results

BCGA and ELM are combined and used to select a subset of genes that are present in the Global Cancer Mapping (GCM) data set. This set of candidate genes contains over 52 biomarkers that are related to multiple cancers, according to the literature. They include APOA1, VEGFC, YWHAZ, B2M, EIF2S1, CCR9 and many other genes that have been associated with the hallmarks of cancer. BCGA-ELM is tested on several cancer data sets and the results are compared to other classification methods. BCGA-ELM compares or exceeds other algorithms in terms of accuracy. We were also able to show that over 50% of genes selected by BCGA-ELM on GCM data are cancer related biomarkers.

Conclusions

We were able to simultaneously differentiate between 14 different types of cancers, using only 92 genes, to achieve a multi-class classification accuracy of 95.4% which is between 21.6% and 38% higher than other results in the literature for multi-class cancer classification. Our findings suggest that computational algorithms such as BCGA-ELM can facilitate biomarker-driven integrated cancer research that can lead to a detailed understanding of the complexities of cancer.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0565-5) contains supplementary material, which is available to authorized users.

Collapse

Dessì N, Pes B, Cannas LM. An Evolutionary Approach for Balancing Effectiveness and Representation Level in Gene Selection. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH 2015. [DOI: 10.4018/jitr.2015040102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

García V, Salvador Sánchez J. Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2014.09.064] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Classification of Microarray Data Using Kernel Fuzzy Inference System. INTERNATIONAL SCHOLARLY RESEARCH NOTICES 2014;2014:769159. [PMID: 27433543 PMCID: PMC4897118 DOI: 10.1155/2014/769159] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 05/28/2014] [Accepted: 06/12/2014] [Indexed: 12/02/2022]

A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis. ScientificWorldJournal 2014;2014:195470. [PMID: 25162043 PMCID: PMC4138760 DOI: 10.1155/2014/195470] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Revised: 06/20/2014] [Accepted: 07/02/2014] [Indexed: 11/18/2022] Open

Han F, Sun W, Ling QH. A novel strategy for gene selection of microarray data based on gene-to-class sensitivity information. PLoS One 2014;9:e97530. [PMID: 24844313 PMCID: PMC4028211 DOI: 10.1371/journal.pone.0097530] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Accepted: 04/21/2014] [Indexed: 11/19/2022] Open

Cai H, Ruan P, Ng M, Akutsu T. Feature weight estimation for gene selection: a local hyperlinear learning approach. BMC Bioinformatics 2014;15:70. [PMID: 24625071 PMCID: PMC4007530 DOI: 10.1186/1471-2105-15-70] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 03/06/2014] [Indexed: 11/10/2022] Open

Wang H, Zhang H, Dai Z, Chen MS, Yuan Z. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics 2013;6 Suppl 1:S3. [PMID: 23445528 PMCID: PMC3552704 DOI: 10.1186/1755-8794-6-s1-s3] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and prediction analysis of microarrays (PAM) are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG) classifier simplified as TSG.

RESULTS

The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations.

CONCLUSIONS

Redefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.

Collapse

Zhang H, Wang H, Dai Z, Chen MS, Yuan Z. Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinformatics 2012;13:298. [PMID: 23148517 PMCID: PMC3562261 DOI: 10.1186/1471-2105-13-298] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Accepted: 09/24/2012] [Indexed: 12/21/2022] Open

Abstract

Background

Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes. So far, most methods for gene selection in literature focus on screening individual or pairs of genes without considering the possible interactions among genes. Here we introduce a new computational method named the Binary Matrix Shuffling Filter (BMSF). It not only overcomes the difficulty associated with the search schemes of traditional wrapper methods and overfitting problem in large dimensional search space but also takes potential gene interactions into account during gene selection. This method, coupled with Support Vector Machine (SVM) for implementation, often selects very small number of genes for easy model interpretability.

Results

We applied our method to 9 two-class gene expression datasets involving human cancers. During the gene selection process, the set of genes to be kept in the model was recursively refined and repeatedly updated according to the effect of a given gene on the contributions of other genes in reference to their usefulness in cancer classification. The small number of informative genes selected from each dataset leads to significantly improved leave-one-out (LOOCV) classification accuracy across all 9 datasets for multiple classifiers. Our method also exhibits broad generalization in the genes selected since multiple commonly used classifiers achieved either equivalent or much higher LOOCV accuracy than those reported in literature.

Conclusions

Evaluation of a gene’s contribution to binary cancer classification is better to be considered after adjusting for the joint effect of a large number of other genes. A computationally efficient search scheme was provided to perform effective search in the extensive feature space that includes possible interactions of many genes. Performance of the algorithm applied to 9 datasets suggests that it is possible to improve the accuracy of cancer classification by a big margin when joint effects of many genes are considered.

Collapse

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowé A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1106-19. [PMID: 22350210 DOI: 10.1109/tcbb.2012.33] [Citation(s) in RCA: 219] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

Zhang JG, Li J, Tang W, Deng HW. Fusing Gene Interaction to Improve Disease Discrimination on Classification Analysis. ADVANCES IN GENETICS 2012;1:1000102. [PMID: 23814698 DOI: 10.4172/age.1000102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Cancer classification based on microarray gene expression data using a principal component accumulation method. Sci China Chem 2011. [DOI: 10.1007/s11426-011-4263-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Dagliyan O, Uney-Yuksektepe F, Kavakli IH, Turkay M. Optimization based tumor classification from microarray gene expression data. PLoS One 2011;6:e14579. [PMID: 21326602 PMCID: PMC3033885 DOI: 10.1371/journal.pone.0014579] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Accepted: 12/23/2010] [Indexed: 11/20/2022] Open

Abstract

BACKGROUND

An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types.

METHODOLOGY/PRINCIPAL FINDINGS

We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described.

CONCLUSIONS/SIGNIFICANCE

The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of data sets, HBE method is an effective and consistent tool for cancer type prediction with a small number of gene markers.

Collapse

Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2010.02.025] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Mundra P, Rajapakse J. SVM-RFE With MRMR Filter for Gene Selection. IEEE Trans Nanobioscience 2010;9:31-7. [DOI: 10.1109/tnb.2009.2035284] [Citation(s) in RCA: 218] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Chuang LY, Ke CH, Chang HW, Yang CH. A Two-Stage Feature Selection Method for Gene Expression Data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009;13:127-37. [DOI: 10.1089/omi.2008.0083] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

The Impact of Gene Selection on Imbalanced Microarray Expression Data. BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2009. [DOI: 10.1007/978-3-642-00727-9_25] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]

Gadgil M. A Population Proportion approach for ranking differentially expressed genes. BMC Bioinformatics 2008;9:380. [PMID: 18801167 PMCID: PMC2566584 DOI: 10.1186/1471-2105-9-380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2008] [Accepted: 09/18/2008] [Indexed: 11/14/2022] Open

Su Z, Hong H, Fang H, Shi L, Perkins R, Tong W. Very Important Pool (VIP) genes--an application for microarray-based molecular signatures. BMC Bioinformatics 2008;9 Suppl 9:S9. [PMID: 18793473 PMCID: PMC2537560 DOI: 10.1186/1471-2105-9-s9-s9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics.

RESULTS

A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples.

CONCLUSION

The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.

Collapse

Jiang W, Li X, Rao S, Wang L, Du L, Li C, Wu C, Wang H, Wang Y, Yang B. Constructing disease-specific gene networks using pair-wise relevance metric: application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements. BMC SYSTEMS BIOLOGY 2008;2:72. [PMID: 18691435 PMCID: PMC2535780 DOI: 10.1186/1752-0509-2-72] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2007] [Accepted: 08/10/2008] [Indexed: 12/11/2022]

Abstract

Background

With the advance of large-scale omics technologies, it is now feasible to reversely engineer the underlying genetic networks that describe the complex interplays of molecular elements that lead to complex diseases. Current networking approaches are mainly focusing on building genetic networks at large without probing the interaction mechanisms specific to a physiological or disease condition. The aim of this study was thus to develop such a novel networking approach based on the relevance concept, which is ideal to reveal integrative effects of multiple genes in the underlying genetic circuit for complex diseases.

Results

The approach started with identification of multiple disease pathways, called a gene forest, in which the genes extracted from the decision forest constructed by supervised learning of the genome-wide transcriptional profiles for patients and normal samples. Based on the newly identified disease mechanisms, a novel pair-wise relevance metric, adjusted frequency value, was used to define the degree of genetic relationship between two molecular determinants. We applied the proposed method to analyze a publicly available microarray dataset for colon cancer. The results demonstrated that the colon cancer-specific gene network captured the most important genetic interactions in several cellular processes, such as proliferation, apoptosis, differentiation, mitogenesis and immunity, which are known to be pivotal for tumourigenesis. Further analysis of the topological architecture of the network identified three known hub cancer genes [interleukin 8 (IL8) (p ≈ 0), desmin (DES) (p = 2.71 × 10^-6) and enolase 1 (ENO1) (p = 4.19 × 10^-5)], while two novel hub genes [RNA binding motif protein 9 (RBM9) (p = 1.50 × 10^-4) and ribosomal protein L30 (RPL30) (p = 1.50 × 10^-4)] may define new central elements in the gene network specific to colon cancer. Gene Ontology (GO) based analysis of the colon cancer-specific gene network and the sub-network that consisted of three-way gene interactions suggested that tumourigenesis in colon cancer resulted from dysfunction in protein biosynthesis and categories associated with ribonucleoprotein complex which are well supported by multiple lines of experimental evidence.

Conclusion

This study demonstrated that IL8, DES and ENO1 act as the central elements in colon cancer susceptibility, and protein biosynthesis and the ribosome-associated function categories largely account for the colon cancer tumuorigenesis. Thus, the newly developed relevancy-based networking approach offers a powerful means to reverse-engineer the disease-specific network, a promising tool for systematic dissection of complex diseases.

Collapse

Roberts PC. Gene expression microarray data analysis demystified. BIOTECHNOLOGY ANNUAL REVIEW 2008;14:29-61. [PMID: 18606359 DOI: 10.1016/s1387-2656(08)00002-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Support Vector Based T-Score for Gene Ranking. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/978-3-540-88436-1_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]