251
|
Anand D, Pandey B, Pandey DK. Facioscapulohumeral Muscular Dystrophy Diagnosis Using Hierarchical Clustering Algorithm and K-Nearest Neighbor Based Methodology. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2017. [DOI: 10.4018/ijehmc.2017040103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genetic diagnosis of neuromuscular disorder is an active area of research. Microarrays are used to detect the changes in genes for the accurate diagnosis. Unfortunately, the number of genes in gene expression data is very large as compared to number of samples. The number of genes needs to be reduced for correct diagnosis. In the present paper, the authors have made an intelligent integrated model for clustering and diagnosis of neuromuscular diseases. Wilcoxon signed rank test is used to preselect the genes. K-means and hierarchical clustering algorithms with different distance metric are employed to cluster the genes. Three classifiers namely linear discriminant analysis, quadratic discriminant analysis and k-nearest neighbor are used. For the employment of integrated techniques, a balanced facioscapulohumeral muscular dystrophy dataset is taken. A comparative analysis of the above integrated algorithms is presented which demonstrate that the integration of cosine distance metric hierarchical clustering algorithm with k-nearest neighbor has given the best performance measures.
Collapse
Affiliation(s)
- Divya Anand
- Department of Computer Science and Engineering, Lovely Professional University, Phagwara, India
| | - Babita Pandey
- Department of Computer Applications, Lovely Professional University, Phagwara, India
| | | |
Collapse
|
252
|
Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models. J MULTIVARIATE ANAL 2017. [DOI: 10.1016/j.jmva.2016.12.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
253
|
Xu XL, Ren CX, Wu RC, Yan H. Sliced Inverse Regression With Adaptive Spectral Sparsity for Dimension Reduction. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:759-771. [PMID: 27076475 DOI: 10.1109/tcyb.2016.2526630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Dimension reduction is an important topic in pattern analysis and machine learning, and it has wide applications in feature representation and pattern classification. In the past two decades, sliced inverse regression (SIR) has attracted much research efforts due to its effectiveness and efficacy in dimension reduction. However, two drawbacks limit further applications of SIR. First, the computation complexity of SIR is usually high in the situation of high-dimensional data. Second, sparsity of projection subspace is not well mined for improving the feature selection and model interpretation abilities. This paper proposes to compute the SIR projection vectors in the spectral space, then an approximated regression solution can be obtained with a faster speed. Moreover, the adaptive lasso is used to attain a sparse and globally optimal solution, which is important in variable selection. To complete the robust pattern classification task with corruptions, a correntropy-based and class-wise regression model is designed in this paper. It takes a smooth penalty instead of sparsity constraint in the regression coefficients, and it can be conducted in class-wise, thus it is more flexible in practice. Extensive experiments are conducted by using some real and benchmark data sets, e.g., high-dimensional facial images and gene microarray data, to evaluate the new algorithms. The new proposals attain competitive results and are compared with other state-of-the-art methods.
Collapse
|
254
|
Rohart F, Eslami A, Matigian N, Bougeard S, Lê Cao KA. MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinformatics 2017; 18:128. [PMID: 28241739 PMCID: PMC5327533 DOI: 10.1186/s12859-017-1553-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 02/16/2017] [Indexed: 12/12/2022] Open
Abstract
Background Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods. Results To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method, MINT, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures. Conclusions MINT is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies. MINT is computationally fast as part of the mixOmics R CRAN package, available at http://www.mixOmics.org/mixMINT/and http://cran.r-project.org/web/packages/mixOmics/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1553-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Florian Rohart
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, 4102, QLD, Australia
| | - Aida Eslami
- Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC V6Z 1Y6, Canada
| | - Nicholas Matigian
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, 4102, QLD, Australia
| | - Stéphanie Bougeard
- French agency for food, environmental and occupational health safety (Anses), Department of Epidemiology, Ploufragan, 22440, France
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, 4102, QLD, Australia.
| |
Collapse
|
255
|
Nearest shrunken centroids via alternative genewise shrinkages. PLoS One 2017; 12:e0171068. [PMID: 28199352 PMCID: PMC5310887 DOI: 10.1371/journal.pone.0171068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 01/16/2017] [Indexed: 11/22/2022] Open
Abstract
Nearest shrunken centroids (NSC) is a popular classification method for microarray data. NSC calculates centroids for each class and “shrinks” the centroids toward 0 using soft thresholding. Future observations are then assigned to the class with the minimum distance between the observation and the (shrunken) centroid. Under certain conditions the soft shrinkage used by NSC is equivalent to a LASSO penalty. However, this penalty can produce biased estimates when the true coefficients are large. In addition, NSC ignores the fact that multiple measures of the same gene are likely to be related to one another. We consider several alternative genewise shrinkage methods to address the aforementioned shortcomings of NSC. Three alternative penalties were considered: the smoothly clipped absolute deviation (SCAD), the adaptive LASSO (ADA), and the minimax concave penalty (MCP). We also showed that NSC can be performed in a genewise manner. Classification methods were derived for each alternative shrinkage method or alternative genewise penalty, and the performance of each new classification method was compared with that of conventional NSC on several simulated and real microarray data sets. Moreover, we applied the geometric mean approach for the alternative penalty functions. In general the alternative (genewise) penalties required fewer genes than NSC. The geometric mean of the class-specific prediction accuracies was improved, as well as the overall predictive accuracy in some cases. These results indicate that these alternative penalties should be considered when using NSC.
Collapse
|
256
|
Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 2017; 109:91-107. [PMID: 28159597 DOI: 10.1016/j.ygeno.2017.01.004] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Revised: 01/09/2017] [Accepted: 01/24/2017] [Indexed: 12/25/2022]
Abstract
Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset.
Collapse
|
257
|
Faisal S, Tutz G. Missing value imputation for gene expression data by tailored nearest neighbors. Stat Appl Genet Mol Biol 2017; 16:95-106. [DOI: 10.1515/sagmb-2015-0098] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractHigh dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.
Collapse
|
258
|
Aziz R, Verma C, Srivastava N. Dimension reduction methods for microarray data: a review. AIMS BIOENGINEERING 2017. [DOI: 10.3934/bioeng.2017.2.179] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
259
|
A Meta-Review of Feature Selection Techniques in the Context of Microarray Data. BIOINFORMATICS AND BIOMEDICAL ENGINEERING 2017. [DOI: 10.1007/978-3-319-56148-6_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
260
|
Arun Kumar C, Sooraj M, Ramakrishnan S. A Comparative Performance Evaluation of Supervised Feature Selection Algorithms on Microarray Datasets. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.procs.2017.09.127] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
261
|
Lottaz C, Gronwald W, Spang R, Engelmann JC. High-Dimensional Profiling for Computational Diagnosis. Methods Mol Biol 2017; 1526:205-229. [PMID: 27896744 DOI: 10.1007/978-1-4939-6613-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
New technologies allow for high-dimensional profiling of patients. For instance, genome-wide gene expression analysis in tumors or in blood is feasible with microarrays, if all transcripts are known, or even without this restriction using high-throughput RNA sequencing. Other technologies like NMR finger printing allow for high-dimensional profiling of metabolites in blood or urine. Such technologies for high-dimensional patient profiling represent novel possibilities for molecular diagnostics. In clinical profiling studies, researchers aim to predict disease type, survival, or treatment response for new patients using high-dimensional profiles. In this process, they encounter a series of obstacles and pitfalls. We review fundamental issues from machine learning and recommend a procedure for the computational aspects of a clinical profiling study.
Collapse
Affiliation(s)
- Claudio Lottaz
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany.
| | - Wolfram Gronwald
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Rainer Spang
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Julia C Engelmann
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| |
Collapse
|
262
|
Aziz R, Verma C, Srivastava N. Dimension reduction methods for microarray data: a review. AIMS BIOENGINEERING 2017. [DOI: 10.3934/bioeng.2017.1.179] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
263
|
Gangeh MJ, Zarkoob H, Ghodsi A. Fast and Scalable Feature Selection for Gene Expression Data Using Hilbert-Schmidt Independence Criterion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:167-181. [PMID: 28182548 DOI: 10.1109/tcbb.2016.2631164] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
GOAL In computational biology, selecting a small subset of informative genes from microarray data continues to be a challenge due to the presence of thousands of genes. This paper aims at quantifying the dependence between gene expression data and the response variables and to identifying a subset of the most informative genes using a fast and scalable multivariate algorithm. METHODS A novel algorithm for feature selection from gene expression data was developed. The algorithm was based on the Hilbert-Schmidt independence criterion (HSIC), and was partly motivated by singular value decomposition (SVD). RESULTS The algorithm is computationally fast and scalable to large datasets. Moreover, it can be applied to problems with any type of response variables including, biclass, multiclass, and continuous response variables. The performance of the proposed algorithm in terms of accuracy, stability of the selected genes, speed, and scalability was evaluated using both synthetic and real-world datasets. The simulation results demonstrated that the proposed algorithm effectively and efficiently extracted stable genes with high predictive capability, in particular for datasets with multiclass response variables. CONCLUSION/SIGNIFICANCE The proposed method does not require the whole microarray dataset to be stored in memory, and thus can easily be scaled to large datasets. This capability is an important attribute in big data analytics, where data can be large and massively distributed.
Collapse
|
264
|
Yuan Y, Shi Y, Li C, Kim J, Cai W, Han Z, Feng DD. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinformatics 2016; 17:476. [PMID: 28155641 PMCID: PMC5259816 DOI: 10.1186/s12859-016-1334-9] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. RESULTS To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. CONCLUSIONS Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.
Collapse
Affiliation(s)
- Yuchen Yuan
- School of Information Technologies, The University of Sydney, Darlington, NSW, 2008, Australia.,Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Yi Shi
- Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai, 200240, China.
| | - Changyang Li
- School of Information Technologies, The University of Sydney, Darlington, NSW, 2008, Australia
| | - Jinman Kim
- School of Information Technologies, The University of Sydney, Darlington, NSW, 2008, Australia
| | - Weidong Cai
- School of Information Technologies, The University of Sydney, Darlington, NSW, 2008, Australia
| | - Zeguang Han
- Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai, 200240, China
| | - David Dagan Feng
- School of Information Technologies, The University of Sydney, Darlington, NSW, 2008, Australia.,Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai, 200240, China
| |
Collapse
|
265
|
Ganesh Kumar P, Kavitha MS, Ahn BC. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data. PLoS One 2016; 11:e0167504. [PMID: 27936033 PMCID: PMC5148587 DOI: 10.1371/journal.pone.0167504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 11/15/2016] [Indexed: 11/22/2022] Open
Abstract
This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified samples suggest that the proposed FRFI-WSA approach is reliable for classification of an individual’s cancer gene expression data with high precision and therefore it could be helpful for clinicians as a clinical decision support system.
Collapse
Affiliation(s)
| | - Muthu Subash Kavitha
- Department of Computer Vision and Image Processing, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
| | - Byeong-Cheol Ahn
- Department of Nuclear Medicine, Kyungpook National University School of Medicine and Hospital, Daegu, South Korea
- * E-mail:
| |
Collapse
|
266
|
Yang A, Jiang X, Liu P, Lin J. Sparse Bayesian multinomial probit regression model with correlation prior for high-dimensional data classification. Stat Probab Lett 2016. [DOI: 10.1016/j.spl.2016.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
267
|
Zeng B, Wen XM, Zhu L. A link-free sparse group variable selection method for single-index model. J Appl Stat 2016. [DOI: 10.1080/02664763.2016.1254731] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Bilin Zeng
- Department of Mathematics, California State University, Bakersfield, CA, USA
| | - Xuerong Meggie Wen
- Department of Mathematics and Statistics, Missouri University of Science and Technology, Rolla, MO, USA
| | - Lixing Zhu
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, People's Republic of China
| |
Collapse
|
268
|
Devi Arockia Vanitha C, Devaraj D, Venkatesulu M. Multiclass cancer diagnosis in microarray gene expression profile using mutual information and Support Vector Machine. INTELL DATA ANAL 2016. [DOI: 10.3233/ida-150203] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
| | - D. Devaraj
- Department of Electrical and Electronics Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India
| | - M. Venkatesulu
- Department of Computer Applications, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India
| |
Collapse
|
269
|
|
270
|
Ammar M, Bouaziz S, Alimi AM, Abraham A. Multi-agent architecture for Multi‐objective optimization of Flexible Neural Tree. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.06.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
271
|
Abstract
Developing improved approaches for diagnosis, treatment, and prevention of diseases is a major goal of biomedical research. Therefore, the discovery of biomarker signatures from high-throughput "omics" data is an active research topic in the field of bioinformatics and systems medicine. A major issue is the low reproducibility and the limited biological interpretability of candidate biomarker signatures identified from high-throughput data. This impedes the use of discovered biomarker signatures into clinical applications. Currently, much focus is placed on developing strategies to improve reproducibility and interpretability. Researchers have fruitfully started to incorporate prior knowledge derived from pathways and molecular networks into the process of biomarker identification. In this chapter, after giving a general introduction to the problem of disease classification and biomarker discovery, we will review two types of network-assisted approaches: (1) approaches inferring activity scores for specific pathways which are subsequently used for classification and (2) approaches identifying subnetworks or modules of molecular networks by differential network analysis which can serve as biomarker signatures.
Collapse
|
272
|
Intraoperative Diagnosis Support Tool for Serous Ovarian Tumors Based on Microarray Data Using Multicategory Machine Learning. Int J Gynecol Cancer 2016; 26:104-13. [PMID: 26512784 DOI: 10.1097/igc.0000000000000566] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVES Serous borderline ovarian tumors (SBOTs) are a subtype of serous ovarian carcinoma with atypical proliferation. Frozen-section diagnosis has been used as an intraoperative diagnosis tool in supporting the fertility-sparing surgery by diagnosing SBOTs with accuracy of 48% to 79%. Using DNA microarray technology, we designed multicategory classification models to support frozen-section diagnosis within 30 minutes. MATERIALS AND METHODS We systematically evaluated 6 machine learning algorithms and 3 feature selection methods using 5-fold cross-validation and a grid search on microarray data obtained from the National Center for Biotechnology Information. To validate the models and selected biomarkers, expression profiles were analyzed in tissue samples obtained from the Yonsei University College of Medicine. RESULTS The best accuracy of the optimal machine learning model was 97.3%. In addition, 5 features, including the expression of the putative biomarkers SNTN and AOX1, were selected to differentiate between normal, SBOT, and serous ovarian carcinoma groups. Different expression levels of SNTN and AOX1 were validated by real-time quantitative reverse-transcription polymerase chain reaction, Western blotting, and immunohistochemistry. A multinomial logistic regression model using SNTN and AOX1 alone was used to construct a simple-to-use equation that gave a diagnostic test accuracy of 91.9%. CONCLUSIONS We identified 2 biomarkers, SNTN and AOX1, that are likely involved in the pathogenesis and progression of ovarian tumors. An accurate diagnosis of ovarian tumor subclasses by application of the equation in conjunction with expression analysis of SNTN and AOX1 would offer a new accurate diagnosis tool in conjunction with frozen-section diagnosis within 30 minutes.
Collapse
|
273
|
Khondoker M, Dobson R, Skirrow C, Simmons A, Stahl D. A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies. Stat Methods Med Res 2016; 25:1804-1823. [PMID: 24047600 PMCID: PMC5081132 DOI: 10.1177/0962280213502437] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Recent literature on the comparison of machine learning methods has raised questions about the neutrality, unbiasedness and utility of many comparative studies. Reporting of results on favourable datasets and sampling error in the estimated performance measures based on single samples are thought to be the major sources of bias in such comparisons. Better performance in one or a few instances does not necessarily imply so on an average or on a population level and simulation studies may be a better alternative for objectively comparing the performances of machine learning algorithms. METHODS We compare the classification performance of a number of important and widely used machine learning algorithms, namely the Random Forests (RF), Support Vector Machines (SVM), Linear Discriminant Analysis (LDA) and k-Nearest Neighbour (kNN). Using massively parallel processing on high-performance supercomputers, we compare the generalisation errors at various combinations of levels of several factors: number of features, training sample size, biological variation, experimental variation, effect size, replication and correlation between features. RESULTS For smaller number of correlated features, number of features not exceeding approximately half the sample size, LDA was found to be the method of choice in terms of average generalisation errors as well as stability (precision) of error estimates. SVM (with RBF kernel) outperforms LDA as well as RF and kNN by a clear margin as the feature set gets larger provided the sample size is not too small (at least 20). The performance of kNN also improves as the number of features grows and outplays that of LDA and RF unless the data variability is too high and/or effect sizes are too small. RF was found to outperform only kNN in some instances where the data are more variable and have smaller effect sizes, in which cases it also provide more stable error estimates than kNN and LDA. Applications to a number of real datasets supported the findings from the simulation study.
Collapse
Affiliation(s)
- Mizanur Khondoker
- King's College London, Institute of Psychiatry, Department of Biostatistics, London, UK King's College London, Institute of Psychiatry, NIHR Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust, London, UK
| | - Richard Dobson
- King's College London, Institute of Psychiatry, NIHR Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust, London, UK King's College London, Institute of Psychiatry, NIHR Biomedical Research Unit for Dementia at the South London and Maudsley NHS Foundation Trust, London, UK
| | - Caroline Skirrow
- King's College London, Institute of Psychiatry, MRC Social, Genetic and Developmental Psychiatry Centre, UK
| | - Andrew Simmons
- King's College London, Institute of Psychiatry, NIHR Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust, London, UK King's College London, Institute of Psychiatry, NIHR Biomedical Research Unit for Dementia at the South London and Maudsley NHS Foundation Trust, London, UK
| | - Daniel Stahl
- King's College London, Institute of Psychiatry, Department of Biostatistics, London, UK
| |
Collapse
|
274
|
|
275
|
Dong K, Zhao H, Tong T, Wan X. NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinformatics 2016; 17:369. [PMID: 27623864 PMCID: PMC5022247 DOI: 10.1186/s12859-016-1208-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 08/24/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. RESULTS In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. CONCLUSIONS We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.R or https://github.com/yangchadam/NBLDA.
Collapse
Affiliation(s)
- Kai Dong
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, 06510, CT, USA
| | - Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Xiang Wan
- Department of Computer Science and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
| |
Collapse
|
276
|
Gaynanova I, Booth JG, Wells MT. Simultaneous Sparse Estimation of Canonical Vectors in the p ≫ N Setting. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1034318] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
277
|
Feng L, Zou C, Wang Z. Multivariate-Sign-Based High-Dimensional Tests for the Two-Sample Location Problem. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1035380] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
278
|
Abstract
BACKGROUND Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses. METHODS This study presents a modified Artificial Bee Colony Algorithm (ABC) to select minimum number of genes that are deemed to be significant for cancer along with improvement of predictive accuracy. The search equation of ABC is believed to be good at exploration but poor at exploitation. To overcome this limitation we have modified the ABC algorithm by incorporating the concept of pheromones which is one of the major components of Ant Colony Optimization (ACO) algorithm and a new operation in which successive bees communicate to share their findings. RESULTS The proposed algorithm is evaluated using a suite of ten publicly available datasets after the parameters are tuned scientifically with one of the datasets. Obtained results are compared to other works that used the same datasets. The performance of the proposed method is proved to be superior. CONCLUSION The method presented in this paper can provide subset of genes leading to more accurate classification results while the number of selected genes is smaller. Additionally, the proposed modified Artificial Bee Colony Algorithm could conceivably be applied to problems in other areas as well.
Collapse
Affiliation(s)
| | - Rameen Shakur
- Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Mohammad Kaykobad
- A ℓEDA Group, Department of CSE, BUET, Dhaka-1205, Dhaka, Bangladesh
| | | |
Collapse
|
279
|
Yan RJ, Gong HQ, Zhang PM, Liang PJ. Coding Properties of Mouse Retinal Ganglion Cells with Dual-Peak Patterns with Respect to Stimulus Intervals. Front Comput Neurosci 2016; 10:75. [PMID: 27486396 PMCID: PMC4949255 DOI: 10.3389/fncom.2016.00075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 07/05/2016] [Indexed: 11/16/2022] Open
Abstract
How visual information is encoded in spikes of retinal ganglion cells (RGCs) is essential in visual neuroscience. In the present study, we investigated the coding properties of mouse RGCs with dual-peak patterns with respect to visual stimulus intervals. We first analyzed the response properties, and observed that the latencies and spike counts of the two response peaks in the dual-peak pattern exhibited systematic changes with the preceding light-OFF interval. We then applied linear discriminant analysis (LDA) to assess the relative contributions of response characteristics of both peaks in information coding regarding the preceding stimulus interval. It was found that for each peak, the discrimination results were far better than chance level based on either latency or spike count, and were further improved by using the combination of the two parameters. Furthermore, the best discrimination results were obtained when latencies and spike counts of both peaks were considered in combination. In addition, the correct rate for stimulation discrimination was higher when RGC population activity was considered as compare to single neuron's activity, and the correct rate was increased with the group size. These results suggest that rate coding, temporal coding, and population coding are all involved in encoding the different stimulus-interval patterns, and the two response peaks in the dual-peak pattern carry complementary information about stimulus interval.
Collapse
Affiliation(s)
- Ru-Jia Yan
- School of Biomedical Engineering, Shanghai Jiao Tong University Shanghai, China
| | - Hai-Qing Gong
- School of Biomedical Engineering, Shanghai Jiao Tong University Shanghai, China
| | - Pu-Ming Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University Shanghai, China
| | - Pei-Ji Liang
- School of Biomedical Engineering, Shanghai Jiao Tong University Shanghai, China
| |
Collapse
|
280
|
Faria AWC, da Silva AM, de Souza Rodrigues T, Costa MA, Braga AP. A Ranking Approach for Probe Selection and Classification of Microarray Data with Artificial Neural Networks. J Comput Biol 2016; 22:953-61. [PMID: 26418055 DOI: 10.1089/cmb.2013.0125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Acute leukemia classification into its myeloid and lymphoblastic subtypes is usually accomplished according to the morphology of the tumor. Nevertheless, the subtypes may have similar histopathological appearance, making screening procedures difficult. In addition, approximately one-third of acute myeloid leukemias are characterized by aberrant cytoplasmic localization of nucleophosmin (NPMc(+)), where the majority has a normal karyotype. This work is based on two DNA microarray datasets, available publicly, to differentiate leukemia subtypes. The datasets were split into training and test sets, and feature selection methods were applied. Artificial neural network classifiers were developed to compare the feature selection methods. For the first dataset, 50 genes selected using the best classifier was able to classify all patients in the test set. For the second dataset, five genes yielded 97.5% accuracy in the test set.
Collapse
Affiliation(s)
| | | | - Thiago de Souza Rodrigues
- 2 Computer Department, Federal Center of Technological Education of Minas Gerais , Belo Horizonte, MG, Brazil
| | - Marcelo Azevedo Costa
- 1 Graduate Program in Electrical Engineering, Federal University of Minas Gerais , Belo Horizonte, MG, Brazil
| | - Antonio Padua Braga
- 1 Graduate Program in Electrical Engineering, Federal University of Minas Gerais , Belo Horizonte, MG, Brazil
| |
Collapse
|
281
|
Akkiprik M, Nicorici D, Cogdell D, Jia YJ, Hategan A, Tabus I, Yli-Harja O, Y D, Sahin A, Zhang W. Dissection of Signaling Pathways in Fourteen Breast Cancer Cell Lines Using Reverse-Phase Protein Lysate Microarray. Technol Cancer Res Treat 2016; 5:543-51. [PMID: 17121430 DOI: 10.1177/153303460600500601] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Signal transduction pathways play a crucial role in breast cancer development, progression, and response to different therapies. A major problem in breast cancer therapy is the heterogeneity among different tumor types and cell lines commonly used in preclinical studies. To characterize the signaling pathways of some of the commonly used breast cancer cell lines and dissect the relationship among a number of pathways and some key genetic and molecular events in breast cancer development, such as p53 mutation, ErbB2 expression, and estrogen receptor (ER)/progesterone receptor (PR) status, we performed pathway profiling of 14 breast cancer cell lines by measuring the expression and phosphorylation status of 40 different cell signaling proteins with 53 specific antibodies using a protein lysate array. Cluster analysis of the expression data showed that there was close clustering of phosphatidylinositol 3-kinase, Akt, mammalian target of rapamycin (mTOR), Src, and platelet-derived growth factor receptor β (PDGFRβ) in all of the cell lines. The most differentially expressed proteins between ER- and PR-positive and ER- and PR-negative breast cells were mTOR, Akt (pThr308), PDGFRβ, PDGFRβ (pTyr751), panSrc, Akt (pSer473), insulin-like growth factor-binding protein 5 (IGFBP5), Src (pTyr418), mTOR (pSer2448), and IGFBP2. Many apoptotic proteins, such as apoptosis-inducing factor, IGFBP3, bad, bax, and cleaved caspase 9, were overexpressed in mutant p53-carrying breast cancer cells. Hexokinase isoenzyme 1, ND2, and c-kit were the most differentially expressed proteins in high and low ErbB2-expressing breast cancer cells. This study demonstrated that ER/PR status, ErbB2 expression, and p53 status are major molecules that impact downstream signaling pathways.
Collapse
Affiliation(s)
- M Akkiprik
- Department of Pathology, Unit 85, The University of Texas, M. D. Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
282
|
Guerrier S, Mili N, Molinari R, Orso S, Avella-Medina M, Ma Y. A Predictive Based Regression Algorithm for Gene Network Selection. Front Genet 2016; 7:97. [PMID: 27379155 PMCID: PMC4908120 DOI: 10.3389/fgene.2016.00097] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 05/16/2016] [Indexed: 11/13/2022] Open
Abstract
Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. To do so, many of the recently proposed classification methods require some form of dimension-reduction of the problem which finally provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. We propose a new prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. Based on cross-validation techniques and the idea of importance sampling, our proposal scans low-dimensional models under the assumption of sparsity and, for each of them, estimates their objective function to assess their predictive power in order to select. Two applications on cancer data sets and a simulation study show that the proposal compares favorably with competing alternatives such as, for example, Elastic Net and Support Vector Machine. Indeed, the proposed method not only selects smaller models for better, or at least comparable, classification errors but also provides a set of selected models instead of a single one, allowing to construct a network of possible models for a target prediction accuracy level.
Collapse
Affiliation(s)
- Stéphane Guerrier
- Department of Statistics, University of Illinois at Urbana-Champaign Champaign, IL, USA
| | - Nabil Mili
- Research Center for Statistics, Geneva School of Economics and Management, University of Geneva Geneva, Switzerland
| | - Roberto Molinari
- Research Center for Statistics, Geneva School of Economics and Management, University of Geneva Geneva, Switzerland
| | - Samuel Orso
- Research Center for Statistics, Geneva School of Economics and Management, University of Geneva Geneva, Switzerland
| | - Marco Avella-Medina
- Research Center for Statistics, Geneva School of Economics and Management, University of Geneva Geneva, Switzerland
| | - Yanyuan Ma
- Department of Statistics, University of South Carolina Columbia, SC, USA
| |
Collapse
|
283
|
Yang A, Jiang X, Shu L, Lin J. Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis. Comput Stat 2016. [DOI: 10.1007/s00180-016-0665-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
284
|
Liu ZP. Identifying network-based biomarkers of complex diseases from high-throughput data. Biomark Med 2016; 10:633-50. [PMID: 26786840 DOI: 10.2217/bmm-2015-0035] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
In this work, we review the main available computational methods of identifying biomarkers of complex diseases from high-throughput data. The emerging omics techniques provide powerful alternatives to measure thousands of molecules in cells in parallel manners. The generated genomic, transcriptomic, proteomic, metabolomic and phenomic data provide comprehensive molecular and cellular information for detecting critical signals served as biomarkers by classifying disease phenotypic states. Networks are often employed to organize these profiles in the identification of biomarkers to deal with complex diseases in diagnosis, prognosis and therapy as well as mechanism deciphering from systematic perspectives. Here, we summarize some representative network-based bioinformatics methods in order to highlight the importance of computational strategies in biomarker discovery.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science & Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
285
|
|
286
|
Li B, Zhou Y, Zhao M, Hou B, Zhang D, Wang Q, Huang Y. Visible and Near-Infrared Hyper-Spectral Imaging for the Identification of the Type of Wax on Pears. J FOOD PROCESS PRES 2016. [DOI: 10.1111/jfpp.12749] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Baicheng Li
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| | - Yao Zhou
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| | - Mantong Zhao
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| | - Baolu Hou
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| | - Dawei Zhang
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| | - Qi Wang
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| | - Yuanshen Huang
- Ministry of Education Optical Instrument and Systems Engineering Center, and Shanghai Key Laboratory of Modern Optical System; University of Shanghai for Science and Technology; No.516 Jungong Road Shanghai 200093 China
| |
Collapse
|
287
|
Fan J, Feng Y, Jiang J, Tong X. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification. J Am Stat Assoc 2016; 111:275-287. [PMID: 27185970 DOI: 10.1080/01621459.2015.1005212] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Collapse
Affiliation(s)
- Jianqing Fan
- Jianqing Fan is Frederick L. Moore Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544 ( )
| | - Yang Feng
- Yang Feng is Assistant Professor, Department of Statistics, Columbia University, New York, NY, 10027 ( )
| | - Jiancheng Jiang
- Jiancheng Jiang is Associate Professor, Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, 28223 ( )
| | - Xin Tong
- Xin Tong is Assistant Professor, Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA, 90089 ( )
| |
Collapse
|
288
|
Wen J, Luo K, Liu H, Liu S, Lin G, Hu Y, Zhang X, Wang G, Chen Y, Chen Z, Li Y, Lin T, Xie X, Liu M, Wang H, Yang H, Fu J. MiRNA Expression Analysis of Pretreatment Biopsies Predicts the Pathological Response of Esophageal Squamous Cell Carcinomas to Neoadjuvant Chemoradiotherapy. Ann Surg 2016; 263:942-8. [PMID: 26445467 DOI: 10.1097/sla.0000000000001489] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
OBJECTIVE To identify miRNA markers useful for esophageal squamous cell carcinoma (ESCC) neoadjuvant chemoradiotherapy (neo-CRT) response prediction. SUMMARY Neo-CRT followed by surgery improves ESCC patients' survival compared with surgery alone. However, CRT outcomes are heterogeneous, and no current methods can predict CRT responses. METHODS Differentially expressed miRNAs between ESCC pathological responders and nonresponders after neo-CRT were identified by miRNA profiling and verified by real-time quantitative polymerase chain reaction (qPCR) of 27 ESCCs in the training set. Several class prediction algorithms were used to build the response-classifying models with the qPCR data. Predictive powers of the models were further assessed with a second set of 79 ESCCs. RESULTS Ten miRNAs with greater than a 1.5-fold change between pathological responders and nonresponders were identified and verified, respectively. A support vector machine (SVM) prediction model, composed of 4 miRNAs (miR-145-5p, miR-152, miR-193b-3p, and miR-376a-3p), were developed. It provided overall accuracies of 100% and 87.3% for discriminating pathological responders and nonresponders in the training and external validation sets, respectively. In multivariate analysis, the subgroup determined by the SVM model was the only independent factor significantly associated with neo-CRT response in the external validation sets. CONCLUSIONS Combined qPCR of the 4 miRNAs provides the possibility of ESCC neo-CRT response prediction, which may facilitate individualized ESCC treatment. Further prospective validation in larger independent cohorts is necessary to fully assess its predictive power.
Collapse
Affiliation(s)
- Jing Wen
- *State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China †Guangdong Esophageal Cancer Institute Guangzhou, China ‡Department of Thoracic Oncology, Sun Yat-sen University Cancer Center, Guangzhou, China §Department of Radiotherapy, Sun Yat-sen University Cancer Center, Guangzhou, China ¶Guangzhou Haige Communications Group Incorporated Company, Guangzhou, China ||School of Electronic & Information Engineering, South China University of Technology, Guangzhou, China **Department of Thoracic Surgery, Cancer Hospital of Shantou University Medical College, Shantou, China ††Department of Radiotherapy, Cancer Hospital of Shantou University Medical College, Shantou, China ‡‡Department of Anesthesiology, Sun Yat-sen University Cancer Center, Guangzhou, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
289
|
Chételat D, Wells MT. Improved second order estimation in the singular multivariate normal model. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2016.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
290
|
Borup R, Thuesen LL, Andersen CY, Nyboe-Andersen A, Ziebe S, Winther O, Grøndahl ML. Competence Classification of Cumulus and Granulosa Cell Transcriptome in Embryos Matched by Morphology and Female Age. PLoS One 2016; 11:e0153562. [PMID: 27128483 PMCID: PMC4851390 DOI: 10.1371/journal.pone.0153562] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 03/31/2016] [Indexed: 12/23/2022] Open
Abstract
Objective By focussing on differences in the mural granulosa cell (MGC) and cumulus cell (CC) transcriptomes from follicles resulting in competent (live birth) and non-competent (no pregnancy) oocytes the study aims on defining a competence classifier expression profile in the two cellular compartments. Design: A case-control study. Setting: University based facilities for clinical services and research. Patients: MGC and CC samples from 60 women undergoing IVF treatment following the long GnRH-agonist protocol were collected. Samples from 16 oocytes where live birth was achieved and 16 age- and embryo morphology matched incompetent oocytes were included in the study. Methods MGC and CC were isolated immediately after oocyte retrieval. From the 16 competent and non-competent follicles, mRNA was extracted and expression profile generated on the Human Gene 1.0 ST Affymetrix array. Live birth prediction analysis using machine learning algorithms (support vector machines) with performance estimation by leave-one-out cross validation and independent validation on an external data set. Results We defined a signature of 30 genes expressed in CC predictive of live birth. This live birth prediction model had an accuracy of 81%, a sensitivity of 0.83, a specificity of 0.80, a positive predictive value of 0.77, and a negative predictive value of 0.86. Receiver operating characteristic analysis found an area under the curve of 0.86, significantly greater than random chance. When applied on 3 external data sets with the end-point outcome measure of blastocyst formation, the signature resulted in 62%, 75% and 88% accuracy, respectively. The genes in the classifier are primarily connected to apoptosis and involvement in formation of extracellular matrix. We were not able to define a robust MGC classifier signature that could classify live birth with accuracy above random chance level. Conclusion We have developed a cumulus cell classifier, which showed a promising performance on external data. This suggests that the gene signature at least partly include genes that relates to competence in the developing blastocyst.
Collapse
Affiliation(s)
- Rehannah Borup
- Center for Genomic Medicine, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
- * E-mail:
| | - Lea Langhoff Thuesen
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Claus Yding Andersen
- Laboratory of Reproductive Biology, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Anders Nyboe-Andersen
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Søren Ziebe
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Ole Winther
- Bioinformatics Center, Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, Denmark
| | - Marie Louise Grøndahl
- Fertility Clinic, University Hospital of Copenhagen, Herlev Hospital, Copenhagen, Denmark
| |
Collapse
|
291
|
Pineda AL, Ogoe HA, Balasubramanian JB, Rangel Escareño C, Visweswaran S, Herman JG, Gopalakrishnan V. On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue. BMC Cancer 2016; 16:184. [PMID: 26944944 PMCID: PMC4778315 DOI: 10.1186/s12885-016-2223-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 02/28/2016] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue. METHODS Using publicly available datasets for gene expression and DNA methylation, we applied four classification tasks, depending on the possible combinations of tumor and TAHN tissue. First, we used a feature selector (ReliefF/Limma) to select relevant variables, which were then used to build a simple naïve Bayes classification model. Then, we evaluated the classification performance of our models by measuring the area under the receiver operating characteristic curve (AUC). Finally, we analyzed the relevance of the selected genes using hierarchical clustering and IPA® software for gene functional analysis. RESULTS All Bayesian models achieved high classification performance (AUC > 0.94), which were confirmed by hierarchical cluster analysis. From the genes selected, 25 (93 %) were found to be related to cancer (19 were associated with ADC or SCC), confirming the biological relevance of our method. CONCLUSIONS The results from this study confirm that computational methods using tumor and TAHN tissue can serve as a prognostic tool for lung cancer subtype classification. Our study complements results from other studies where TAHN tissue has been used as prognostic tool for prostate cancer. The clinical implications of this finding could greatly benefit lung cancer patients.
Collapse
Affiliation(s)
- Arturo López Pineda
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - Henry Ato Ogoe
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - Jeya Balaji Balasubramanian
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - Claudia Rangel Escareño
- Department of Computational Genomics, National Institute of Genomic Medicine, Periferico Sur No. 4809, Col. Arenal Tepepan, Tlalpan, 14610, Mexico City, Mexico.
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| | - James Gordon Herman
- Division of Hematology/Oncology, Department of Medicine, University of Pittsburgh School of Medicine, UPMC Cancer Pavilion, 5150 Centre Avenue, 15232, Pittsburgh, PA, USA.
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
| |
Collapse
|
292
|
Durussel J, Haile DW, Mooses K, Daskalaki E, Beattie W, Mooses M, Mekonen W, Ongaro N, Anjila E, Patel RK, Padmanabhan N, McBride MW, McClure JD, Pitsiladis YP. Blood transcriptional signature of recombinant human erythropoietin administration and implications for antidoping strategies. Physiol Genomics 2016; 48:202-9. [DOI: 10.1152/physiolgenomics.00108.2015] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 01/07/2016] [Indexed: 01/18/2023] Open
Abstract
Recombinant human erythropoietin (rHuEPO) is frequently abused by athletes as a performance-enhancing drug, despite being prohibited by the World Anti-Doping Agency. Although the methods to detect blood doping, including rHuEPO injections, have improved in recent years, they remain imperfect. In a proof-of-principle study, we identified, replicated, and validated the whole blood transcriptional signature of rHuEPO in endurance-trained Caucasian males at sea level ( n = 18) and Kenyan endurance runners at moderate altitude ( n = 20), all of whom received rHuEPO injections for 4 wk. Transcriptional profiling shows that hundreds of transcripts were altered by rHuEPO in both cohorts. The main regulated expression pattern, observed in all participants, was characterized by a “rebound” effect with a profound upregulation during rHuEPO and a subsequent downregulation up to 4 wk postadministration. The functions of the identified genes were mainly related to the functional and structural properties of the red blood cell. Of the genes identified to be differentially expressed during and post-rHuEPO, we further confirmed a whole blood 34-transcript signature that can distinguish between samples collected pre-, during, and post-rHuEPO administration. By providing biomarkers that can reveal rHuEPO use, our findings represent an advance in the development of new methods for the detection of blood doping.
Collapse
Affiliation(s)
- Jérôme Durussel
- Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | | | - Kerli Mooses
- Faculty of Sport and Exercise Sciences, University of Tartu, Tartu, Estonia
| | - Evangelia Daskalaki
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, United Kingdom
| | - Wendy Beattie
- Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Martin Mooses
- Faculty of Sport and Exercise Sciences, University of Tartu, Tartu, Estonia
| | - Wondyefraw Mekonen
- Department of Medical Physiology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Neford Ongaro
- Department of Medical Physiology, School of Medicine, College of Health Sciences, Moi University, Eldoret, Kenya; and
| | - Edwin Anjila
- Department of Medical Physiology, School of Medicine, College of Health Sciences, Moi University, Eldoret, Kenya; and
| | - Rajan K. Patel
- Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Neal Padmanabhan
- Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Martin W. McBride
- Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - John D. McClure
- Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Yannis P. Pitsiladis
- FIMS Reference Collaborating Centre of Sports Medicine for Anti-Doping Research, University of Brighton, Eastbourne, United Kingdom
| |
Collapse
|
293
|
Grinberg NF, Lovatt A, Hegarty M, Lovatt A, Skøt KP, Kelly R, Blackmore T, Thorogood D, King RD, Armstead I, Powell W, Skøt L. Implementation of Genomic Prediction in Lolium perenne (L.) Breeding Populations. FRONTIERS IN PLANT SCIENCE 2016; 7:133. [PMID: 26904088 PMCID: PMC4751346 DOI: 10.3389/fpls.2016.00133] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 01/25/2016] [Indexed: 05/23/2023]
Abstract
Perennial ryegrass (Lolium perenne L.) is one of the most widely grown forage grasses in temperate agriculture. In order to maintain and increase its usage as forage in livestock agriculture, there is a continued need for improvement in biomass yield, quality, disease resistance, and seed yield. Genetic gain for traits such as biomass yield has been relatively modest. This has been attributed to its long breeding cycle, and the necessity to use population based breeding methods. Thanks to recent advances in genotyping techniques there is increasing interest in genomic selection from which genomically estimated breeding values are derived. In this paper we compare the classical RRBLUP model with state-of-the-art machine learning techniques that should yield themselves easily to use in GS and demonstrate their application to predicting quantitative traits in a breeding population of L. perenne. Prediction accuracies varied from 0 to 0.59 depending on trait, prediction model and composition of the training population. The BLUP model produced the highest prediction accuracies for most traits and training populations. Forage quality traits had the highest accuracies compared to yield related traits. There appeared to be no clear pattern to the effect of the training population composition on the prediction accuracies. The heritability of the forage quality traits was generally higher than for the yield related traits, and could partly explain the difference in accuracy. Some population structure was evident in the breeding populations, and probably contributed to the varying effects of training population on the predictions. The average linkage disequilibrium between adjacent markers ranged from 0.121 to 0.215. Higher marker density and larger training population closely related with the test population are likely to improve the prediction accuracy.
Collapse
Affiliation(s)
| | - Alan Lovatt
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Matt Hegarty
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Andi Lovatt
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Kirsten P. Skøt
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Rhys Kelly
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Tina Blackmore
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Danny Thorogood
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Ross D. King
- Manchester Institute of Biotechnology, University of ManchesterManchester, UK
| | - Ian Armstead
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| | - Wayne Powell
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
- CGIAR Consortium, CGIAR Consortium OfficeMontpellier, France
| | - Leif Skøt
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth UniversityAberystwyth, UK
| |
Collapse
|
294
|
Wylie D, Beaudenon-Huibregtse S, Haynes BC, Giordano TJ, Labourier E. Molecular classification of thyroid lesions by combined testing for miRNA gene expression and somatic gene alterations. JOURNAL OF PATHOLOGY CLINICAL RESEARCH 2016; 2:93-103. [PMID: 27499919 PMCID: PMC4907059 DOI: 10.1002/cjp2.38] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 12/31/2015] [Indexed: 12/20/2022]
Abstract
Multiple molecular markers contribute to the pathogenesis of thyroid cancer and can provide valuable information to improve disease diagnosis and patient management. We performed a comprehensive evaluation of miRNA gene expression in diverse thyroid lesions (n = 534) and developed predictive models for the classification of thyroid nodules, alone or in combination with genotyping. Expression profiling by reverse transcription-quantitative polymerase chain reaction in surgical specimens (n = 257) identified specific miRNAs differentially expressed in 17 histopathological categories. Eight supervised machine learning algorithms were trained to discriminate benign from malignant lesions and evaluated for accuracy and robustness. The selected models showed invariant area under the receiver operating characteristic curve (AUC) in cross-validation (0.89), optimal AUC (0.94) in an independent set of preoperative thyroid nodule aspirates (n = 235), and classified 92% of benign lesions as low risk/negative and 92% of malignant lesions as high risk/positive. Surgical and preoperative specimens were further tested for the presence of 17 validated oncogenic gene alterations in the BRAF, RAS, RET or PAX8 genes. The miRNA-based classifiers complemented and significantly improved the diagnostic performance of the 17-mutation panel (p < 0.001 for McNemar's tests). In a subset of resected tissues (n = 54) and in an independent set of thyroid nodules with indeterminate cytology (n = 42), the optimized ThyraMIR Thyroid miRNA Classifier increased diagnostic sensitivity by 30-39% and correctly classified 100% of benign nodules negative by the 17-mutation panel. In contrast, testing with broad targeted next-generation sequencing panels decreased diagnostic specificity by detecting additional mutations of unknown clinical significance in 19-39% of benign lesions. Our results demonstrate that, independent of mutational status, miRNA expression profiles are strongly associated with altered molecular pathways underlying thyroid tumorigenesis. Combined testing for miRNA gene expression and well-established somatic gene alterations is a novel diagnostic strategy that can improve the preoperative diagnosis and surgical management of patients with indeterminate thyroid nodules.
Collapse
Affiliation(s)
| | | | | | - Thomas J Giordano
- Department of Pathology University of Michigan Health System Ann Arbor Michigan USA
| | | |
Collapse
|
295
|
A modified local quadratic approximation algorithm for penalized optimization problems. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2015.08.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
296
|
Ding J, Wen C, Li G, Chua CS. Locality sensitive batch feature extraction for high-dimensional data. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.07.076] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
297
|
|
298
|
Bonilla-Huerta E, Hernández-Montiel A, Caporal RM, López MA. Hybrid Framework Using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:12-26. [PMID: 26336138 DOI: 10.1109/tcbb.2015.2474384] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A hybrid framework composed of two stages for gene selection and classification of DNA microarray data is proposed. At the first stage, five traditional statistical methods are combined for preliminary gene selection (Multiple Fusion Filter). Then, different relevant gene subsets are selected by using an embedded Genetic Algorithm (GA), Tabu Search (TS), and Support Vector Machine (SVM). A gene subset, consisting of the most relevant genes, is obtained from this process, by analyzing the frequency of each gene in the different gene subsets. Finally, the most frequent genes are evaluated by the embedded approach to obtain a final relevant small gene subset with high performance. The proposed method is tested in four DNA microarray datasets. From simulation study, it is observed that the proposed approach works better than other methods reported in the literature.
Collapse
|
299
|
Dong K, Pang H, Tong T, Genton MG. Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2015.08.022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
300
|
Hadjerci O, Hafiane A, Conte D, Makris P, Vieyres P, Delbos A. Computer-aided detection system for nerve identification using ultrasound images: A comparative study. INFORMATICS IN MEDICINE UNLOCKED 2016. [DOI: 10.1016/j.imu.2016.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|