Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

64
(from Reference Citation Analysis)

Article PDFs (20)

Cited by > 0 (48)

Searched Name

Gene selection

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
51	A centroid-based gene selection method for microarray data classification. J Theor Biol 2016;400:32-41. [PMID: 27056739 DOI: 10.1016/j.jtbi.2016.03.034] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Revised: 03/08/2016] [Accepted: 03/24/2016] [Indexed: 11/28/2022] Abstract For classification problems based on microarray data, the data typically contains a large number of irrelevant and redundant features. In this paper, a new gene selection method is proposed to choose the best subset of features for microarray data with the irrelevant and redundant features removed. We formulate the selection problem as a L1-regularized optimization problem, based on a newly defined linear discriminant analysis criterion. Instead of calculating the mean of the samples, a kernel-based approach is used to estimate the class centroid to define both the between-class separability and the within-class compactness for the criterion. Theoretical analysis indicates that the global optimal solution of the L1-regularized criterion can be reached with a general condition, on which an efficient algorithm is derived to the feature selection problem in a linear time complexity with respect to the number of features and the number of samples. The experimental results on ten publicly available microarray datasets demonstrate that the proposed method performs effectively and competitively compared with state-of-the-art methods. Collapse Key Words Class centroid Classification Gene selection L1 regularization Microarray data Collapse MESH Headings Collapse Grants Collapse
52	The feature selection bias problem in relation to high-dimensional gene data. Artif Intell Med 2015;66:63-71. [PMID: 26674595 DOI: 10.1016/j.artmed.2015.11.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Revised: 09/14/2015] [Accepted: 11/03/2015] [Indexed: 10/22/2022] Abstract OBJECTIVE Feature selection is a technique widely used in data mining. The aim is to select the best subset of features relevant to the problem being considered. In this paper, we consider feature selection for the classification of gene datasets. Gene data is usually composed of just a few dozen objects described by thousands of features. For this kind of data, it is easy to find a model that fits the learning data. However, it is not easy to find one that will simultaneously evaluate new data equally well as learning data. This overfitting issue is well known as regards classification and regression, but it also applies to feature selection. METHODS AND MATERIALS We address this problem and investigate its importance in an empirical study of four feature selection methods applied to seven high-dimensional gene datasets. We chose datasets that are well studied in the literature-colon cancer, leukemia and breast cancer. All the datasets are characterized by a significant number of features and the presence of exactly two decision classes. The feature selection methods used are ReliefF, minimum redundancy maximum relevance, support vector machine-recursive feature elimination and relaxed linear separability. RESULTS Our main result reveals the existence of positive feature selection bias in all 28 experiments (7 datasets and 4 feature selection methods). Bias was calculated as the difference between validation and test accuracies and ranges from 2.6% to as much as 41.67%. The validation accuracy (biased accuracy) was calculated on the same dataset on which the feature selection was performed. The test accuracy was calculated for data that was not used for feature selection (by so called external cross-validation). CONCLUSIONS This work provides evidence that using the same dataset for feature selection and learning is not appropriate. We recommend using cross-validation for feature selection in order to reduce selection bias. Collapse Key Words Convex and piecewise linear classifier Feature selection bias Gene selection Microarray data Support vector machine Collapse MESH Headings Collapse Grants Collapse
53	Gene set differential analysis of time course expression profiles via sparse estimation in functional logistic model with application to time-dependent biomarker detection. Biostatistics 2015;17:235-48. [PMID: 26420796 DOI: 10.1093/biostatistics/kxv037] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/31/2015] [Indexed: 12/17/2022] Open Abstract High-throughput time course expression profiles have been available in the last decade due to developments in measurement techniques and devices. Functional data analysis, which treats smoothed curves instead of originally observed discrete data, is effective for the time course expression profiles in terms of dimension reduction, robustness, and applicability to data measured at small and irregularly spaced time points. However, the statistical method of differential analysis for time course expression profiles has not been well established. We propose a functional logistic model based on elastic net regularization (F-Logistic) in order to identify the genes with dynamic alterations in case/control study. We employ a mixed model as a smoothing method to obtain functional data; then F-Logistic is applied to time course profiles measured at small and irregularly spaced time points. We evaluate the performance of F-Logistic in comparison with another functional data approach, i.e. functional ANOVA test (F-ANOVA), by applying the methods to real and synthetic time course data sets. The real data sets consist of the time course gene expression profiles for long-term effects of recombinant interferon β on disease progression in multiple sclerosis. F-Logistic distinguishes dynamic alterations, which cannot be found by competitive approaches such as F-ANOVA, in case/control study based on time course expression profiles. F-Logistic is effective for time-dependent biomarker detection, diagnosis, and therapy. Collapse Key Words Functional data analysis Gene expression Gene selection Regularization Time series Collapse MESH Headings Collapse Grants Collapse
54	A genetic filter for cancer classification on gene expression data. Biomed Mater Eng 2015;26 Suppl 1:S1993-2002. [PMID: 26405975 DOI: 10.3233/bme-151503] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract We present a new genetic filter to identify a predictive gene subset for cancer-type classification on gene expression profiles. This approach pursues to not only maximize correlation between selected genes and cancer types but also minimize inter-correlation among selected genes. The proposed genetic filter was tested on well-known leukemia datasets, and significant improvement over previous work was obtained. Collapse Key Words Gene selection cancer classification filter method gene expression data genetic algorithm Collapse MESH Headings Collapse Grants Collapse
55	Systematic identification of multiple tumor types in microarray data based on hybrid differential evolution algorithm. Technol Health Care 2015:THC--1-THC1080. [PMID: 26444806 DOI: 10.3233/thc1080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Correct classification and prediction of tumor cells are essential for microarrays to construct a diagnostic system. Differential evolution (DE) is a powerful optimization algorithm, which has been widely used in many areas. However, the standard DE and most of its variants search in the continuous space, which cannot solve the binary optimizations directly. In this paper, the hybrid framework based on the binary DE algorithm and silhouette filter, is proposed to improve searching ability to classify breast and leukemia cancers in microarray for biomarker discovery. The study is focused to use hybrid DE algorithm for gene selection and silhouette statistics as a discriminant function to classify multiple tumor types in microarray data. Distance metrics on silhouette statistics have also been discussed for high classification accuracy. Experimental results show that the hybrid method is effective to discriminate breast and leukemia cancer subtypes and find potential biomarkers for cancer diagnosis. Collapse Key Words Gene selection cancer classification hybrid differential evolution silhouette statistics Collapse MESH Headings Collapse Grants Collapse
56	Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification. GENOMICS DATA 2015;5:46-50. [PMID: 26484222 PMCID: PMC4583628 DOI: 10.1016/j.gdata.2015.04.027] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 04/27/2015] [Accepted: 04/29/2015] [Indexed: 11/26/2022] Abstract Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM), K-nearest neighbor (KNN), Interval Valued Classification (IVC) and the improvised Interval Value based Particle Swarm Optimization (IVPSO) algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions. Collapse Key Words Gene selection Interval-value based Particle Swarm Optimization classification Interval-value classification Microarray Particle swarm optimization Tissue sample classification Collapse MESH Headings Collapse Grants Collapse
57	Improving PLS-RFE based gene selection for microarray data classification. Comput Biol Med 2015;62:14-24. [PMID: 25912984 DOI: 10.1016/j.compbiomed.2015.04.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2014] [Revised: 04/07/2015] [Accepted: 04/08/2015] [Indexed: 10/23/2022] Abstract Gene selection plays a crucial role in constructing efficient classifiers for microarray data classification, since microarray data is characterized by high dimensionality and small sample sizes and contains irrelevant and redundant genes. In practical use, partial least squares-based gene selection approaches can obtain gene subsets of good qualities, but are considerably time-consuming. In this paper, we propose to integrate partial least squares based recursive feature elimination (PLS-RFE) with two feature elimination schemes: simulated annealing and square root, respectively, to speed up the feature selection process. Inspired from the strategy of annealing schedule, the two proposed approaches eliminate a number of features rather than one least informative feature during each iteration and the number of removed features decreases as the iteration proceeds. To verify the effectiveness and efficiency of the proposed approaches, we perform extensive experiments on six publicly available microarray data with three typical classifiers, including Naïve Bayes, K-Nearest-Neighbor and Support Vector Machine, and compare our approaches with ReliefF, PLS and PLS-RFE feature selectors in terms of classification accuracy and running time. Experimental results demonstrate that the two proposed approaches accelerate the feature selection process impressively without degrading the classification accuracy and obtain more compact feature subsets for both two-category and multi-category problems. Further experimental comparisons in feature subset consistency show that the proposed approach with simulated annealing scheme not only has better time performance, but also obtains slightly better feature subset consistency than the one with square root scheme. Collapse Key Words Annealing schedule Classification Gene selection Partial least squares Recursive feature elimination Sequential backward selection Collapse MESH Headings Collapse Grants Collapse
58	Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Comput Biol Chem 2015;56:49-60. [PMID: 25880524 DOI: 10.1016/j.compbiolchem.2015.03.001] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Revised: 03/15/2015] [Accepted: 03/15/2015] [Indexed: 01/06/2023] Abstract Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Collapse Key Words ABC Artificial Bee Colony Cancer classification Feature selection Filter method Gene expression profile Gene selection MRMR Microarray Collapse MESH Headings Collapse Grants Collapse
59	A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 2014;53:381-9. [PMID: 25549938 DOI: 10.1016/j.jbi.2014.12.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 12/14/2014] [Accepted: 12/18/2014] [Indexed: 01/31/2023] Abstract For cancer classification problems based on gene expression, the data usually has only a few dozen sizes but has thousands to tens of thousands of genes which could contain a large number of irrelevant genes. A robust feature selection algorithm is required to remove irrelevant genes and choose the informative ones. Support vector data description (SVDD) has been applied to gene selection for many years. However, SVDD cannot address the problems with multiple classes since it only considers the target class. In addition, it is time-consuming when applying SVDD to gene selection. This paper proposes a novel fast feature selection method based on multiple SVDD and applies it to multi-class microarray data. A recursive feature elimination (RFE) scheme is introduced to iteratively remove irrelevant features, so the proposed method is called multiple SVDD-RFE (MSVDD-RFE). To make full use of all classes for a given task, MSVDD-RFE independently selects a relevant gene subset for each class. The final selected gene subset is the union of these relevant gene subsets. The effectiveness and accuracy of MSVDD-RFE are validated by experiments on five publicly available microarray datasets. Our proposed method is faster and more effective than other methods. Collapse Key Words Gene expression data Gene selection Multi-class classification Support vector data description Support vector machine Collapse MESH Headings Collapse Grants Collapse
60	Computerized system for recognition of autism on the basis of gene expression microarray data. Comput Biol Med 2014;56:82-8. [PMID: 25464350 DOI: 10.1016/j.compbiomed.2014.11.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Revised: 10/24/2014] [Accepted: 11/02/2014] [Indexed: 12/28/2022] Abstract The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of different methods of gene selection, to select the most representative input attributes for an ensemble of classifiers. The set of classifiers is responsible for distinguishing autism data from the reference class. Simultaneous application of a few gene selection methods enables analysis of the ill-conditioned gene expression matrix from different points of view. The results of selection combined with a genetic algorithm and SVM classifier have shown increased accuracy of autism recognition. Early recognition of autism is extremely important for treatment of children and increases the probability of their recovery and return to normal social communication. The results of this research can find practical application in early recognition of autism on the basis of gene expression microarray analysis. Collapse Key Words Ensemble of classifiers Gene expression microarray Gene selection Random forest SVM Collapse MESH Headings Collapse Grants Collapse
61	Meta-analysis based variable selection for gene expression data. Biometrics 2014;70:872-80. [PMID: 25196635 DOI: 10.1111/biom.12213] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 05/01/2014] [Accepted: 05/01/2014] [Indexed: 11/28/2022] Abstract Recent advance in biotechnology and its wide applications have led to the generation of many high-dimensional gene expression data sets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizing scientific evidence from multiple studies. When the dimensions of datasets are high, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. According to our knowledge, all existing methods conduct variable selection with meta-analyzed data in an "all-in-or-all-out" fashion, that is, a gene is either selected in all of studies or not selected in any study. However, due to data heterogeneity commonly exist in meta-analyzed data, including choices of biospecimens, study population, and measurement sensitivity, it is possible that a gene is important in some studies while unimportant in others. In this article, we propose a novel method called meta-lasso for variable selection with high-dimensional meta-analyzed data. Through a hierarchical decomposition on regression coefficients, our method not only borrows strength across multiple data sets to boost the power to identify important genes, but also keeps the selection flexibility among data sets to take into account data heterogeneity. We show that our method possesses the gene selection consistency, that is, when sample size of each data set is large, with high probability, our method can identify all important genes and remove all unimportant genes. Simulation studies demonstrate a good performance of our method. We applied our meta-lasso method to a meta-analysis of five cardiovascular studies. The analysis results are clinically meaningful. Collapse Key Words Gene selection High dimension Meta-analysis Weak oracle property Collapse MESH Headings Collapse Grants Collapse
62	Genes selection comparative study in microarray data analysis. Bioinformation 2014;9:1019-22. [PMID: 24497729 PMCID: PMC3910358 DOI: 10.6026/97320630091019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Accepted: 12/16/2013] [Indexed: 11/23/2022] Open Abstract In response to the rapid development of DNA Microarray Technologies, many differentially expressed genes selection algorithms have been developed, and different comparison studies of these algorithms have been done. However, it is not clear how these methods compare with each other, especially when we used different developments tools. Here, we considered three commonly used differentially expressed genes selection approaches, namely: Fold Change, T-test and SAM, using Bioinformatics Matlab Toolbox and R/BioConductor. We used two datasets, issued from the affymetrix technology, to present results of used methods and software's in gene selection process. The results, in terms of sensitivity and specificity, indicate that the behavior of SAM is better compared to Fold Change and T-test using R/BioConductor. While, no practical differences were observed between the three gene selection methods when using Bioinformatics Matlab Toolbox. In face of our result, the ROC curve shows that: on the one hand R/BioConductor using SAM is favored for microarray selection compared to the other methods. And, on the other hand, results of the three studied gene selection methods using Bioinformatics Matlab Toolbox are still comparable for the two datasets used. Collapse Key Words Bioinformatics Matlab Toolbox Comparative Study Gene selection Microarray data R/BioConductor Collapse MESH Headings Collapse Grants Collapse
63	Weighted Area Under the Receiver Operating Characteristic Curve and Its Application to Gene Selection. J R Stat Soc Ser C Appl Stat 2010;59:673-692. [PMID: 25125706 PMCID: PMC4129959 DOI: 10.1111/j.1467-9876.2010.00713.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Abstract Partial area under the ROC curve (PAUC) has been proposed for gene selection in Pepe et al. (2003) and thereafter applied in real data analysis. It was noticed from empirical studies that this measure has several key weaknesses, such as an inability to reflect nonuniform weighting of different decision thresholds, resulting in large numbers of ties. We propose the weighted area under the ROC curve (WAUC) in this paper to address the problems associated with PAUC. Our proposed measure enjoys a greater flexibility to describe the discrimination accuracy of genes. Nonparametric and parametric estimation methods are introduced, including PAUC as a special case, along with theoretical properties of the estimators. We also provide a simple variance formula, yielding a novel variance estimator for nonparametric estimation of PAUC, which has proven challenging in previous work. The proposed methods permit sensitivity analyses, whereby the impact of differing weight functions on gene rankings may be assessed and results may be synthesized across weights. Simulations and re-analysis of two well-known microarray datasets illustrate the practical utility of WAUC. Collapse Key Words Empirical distribution Gene selection Location-scale model Partial area under the curve Random threshold Weighted area under the curve Collapse MESH Headings Collapse Grants P01 CA142538 NCI NIH HHS P30 AI050410 NIAID NIH HHS R01 CA094893 NCI NIH HHS Collapse
64	A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inform 2007;2:301-11. [PMID: 19458773 PMCID: PMC2675487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open Abstract Microarrays allow researchers to monitor the gene expression patterns for tens of thousands of genes across a wide range of cellular responses, phenotype and conditions. Selecting a small subset of discriminate genes from thousands of genes is important for accurate classification of diseases and phenotypes. Many methods have been proposed to find subsets of genes with maximum relevance and minimum redundancy, which can distinguish accurately between samples with different labels. To find the minimum subset of relevant genes is often referred as biomarker discovery. Two main approaches, filter and wrapper techniques, have been applied to biomarker discovery. In this paper, we conducted a comparative study of different biomarker discovery methods, including six filter methods and three wrapper methods. We then proposed a hybrid approach, FR-Wrapper, for biomarker discovery. The aim of this approach is to find an optimum balance between the precision of the biomarker discovery and the computation cost, by taking advantages of both filter method's efficiency and wrapper method's high accuracy. Our hybrid approach applies Fisher's ratio, a simple method easy to understand and implement, to filter out most of the irrelevant genes, then a wrapper method is employed to reduce the redundancy. The performance of FR-Wrapper approach is evaluated over four widely used microarray datasets. Analysis of experimental results reveals that the hybrid approach can achieve the goal of maximum relevance with minimum redundancy. Collapse Key Words Biomarker discovery Cancer classification Gene expression Gene selection Microarray Collapse MESH Headings Collapse Grants Collapse