1
|
Hou MX, Liu JX, Gao YL, Shang J, Wu SS, Yuan SS. A New Model of Identifying Differentially Expressed Genes via Weighted Network Analysis Based on Dimensionality Reduction Method. Curr Bioinform 2019. [DOI: 10.2174/1574893614666181220094235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Background:
As a method to identify Differentially Expressed Genes (DEGs), Non-
Negative Matrix Factorization (NMF) has been widely praised in bioinformatics. Although NMF
can make DEGs to be easily identified, it cannot provide more associated information for these
DEGs.
Objective:
The methods of network analysis can be used to analyze the correlation of genes, but
they caused more data redundancy and great complexity in gene association analysis of high dimensions.
Dimensionality reduction is worth considering in this condition.
Methods:
In this paper, we provide a new framework by combining the merits of two: NMF is applied
to select DEGs for dimensionality reduction, and then Weighted Gene Co-Expression Network
Analysis (WGCNA) is introduced to cluster on DEGs into similar function modules. The
combination of NMF and WGCNA as a novel model accomplishes the analysis of DEGs for cholangiocarcinoma
(CHOL).
Results:
Some hub genes from DEGs are highlighted in the co-expression network. Candidate
pathways and genes are also discovered in the most relevant module of CHOL.
Conclusion:
The experiments indicate that our framework is effective and the works also provide
some useful clues to the reaches of CHOL.
Collapse
Affiliation(s)
- Mi-Xiao Hou
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, 276826, China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Sha-Sha Wu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Sha-Sha Yuan
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| |
Collapse
|
2
|
Wang J, Liu JX, Zheng CH, Wang YX, Kong XZ, Wen CG. A Mixed-Norm Laplacian Regularized Low-Rank Representation Method for Tumor Samples Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:172-182. [PMID: 29990217 DOI: 10.1109/tcbb.2017.2769647] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Tumor samples clustering based on biomolecular data is a hot issue of cancer classifications discovery. How to extract the valuable information from high dimensional genomic data is becoming an urgent problem in tumor samples clustering. In this paper, we introduce manifold regularization into low-rank representation model and present a novel method named Mixed-norm Laplacian regularized Low-Rank Representation (MLLRR) to identify the differentially expressed genes for tumor clustering based on gene expression data. Then, in order to advance the accuracy and stability of tumor clustering, we establish the clustering model based on Penalized Matrix Decomposition (PMD) and propose a novel cluster method named MLLRR-PMD. In this method, the cancer clustering research includes three steps. First, the matrix of gene expression data is decomposed into a low rank representation matrix and a sparse matrix by MLLRR. Second, the differentially expressed genes are identified based on the sparse matrix. Finally, the PMD is applied to cluster the samples based on the differentially expressed genes. The experiment results on simulation data and real genomic data illustrate that MLLRR method enhances the robustness to outliers and achieves remarkable performance in the extraction of differentially expressed genes.
Collapse
|
3
|
Li H, Li SJ, Shang J, Liu JX, Zheng CH. A Dynamic Scale-Free Network Particle Swarm Optimization for Extracting Features on Multi-Omics Data. J Comput Biol 2018; 26:769-781. [PMID: 30495971 DOI: 10.1089/cmb.2018.0185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Mining meaningful and comprehensive molecular characterization of cancers from The Cancer Genome Atlas (TCGA) data has become a bioinformatics bottleneck. Meanwhile, recent progress in cancer analysis shows that multi-omics data can effectively and systematically detect the cancer-related genes at all levels. In this study, we propose an improved particle swarm optimization with dynamic scale-free network, named DSFPSO, to extract features on multi-omics data. The highlights of DSFPSO are taking the dynamic scale-free network as its population structure and diverse velocity updating strategies for fully considering the heterogeneity of particles and their neighbors. Experiments of DSFPSO and its comparison with several state-of-the-art feature extraction approaches are performed on two public data sets from TCGA. Results show that DSFPSO can extract genes associated with cancers effectively.
Collapse
Affiliation(s)
- Huiyu Li
- 1School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Sheng-Jun Li
- 1School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Junliang Shang
- 1School of Information Science and Engineering, Qufu Normal University, Rizhao, China.,2School of Statistics, Qufu Normal University, Qufu, China
| | - Jin-Xing Liu
- 1School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Chun-Hou Zheng
- 3School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
4
|
Liu J, Cheng Y, Wang X, Zhang L, Liu H. An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data. Sci Rep 2017; 7:8584. [PMID: 28819308 PMCID: PMC5561268 DOI: 10.1038/s41598-017-08881-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 07/14/2017] [Indexed: 12/25/2022] Open
Abstract
It is urgent to diagnose colorectal cancer in the early stage. Some feature genes which are important to colorectal cancer development have been identified. However, for the early stage of colorectal cancer, less is known about the identity of specific cancer genes that are associated with advanced clinical stage. In this paper, we conducted a feature extraction method named Optimal Mean based Block Robust Feature Extraction method (OMBRFE) to identify feature genes associated with advanced colorectal cancer in clinical stage by using the integrated colorectal cancer data. Firstly, based on the optimal mean and L 2,1-norm, a novel feature extraction method called Optimal Mean based Robust Feature Extraction method (OMRFE) is proposed to identify feature genes. Then the OMBRFE method which introduces the block ideology into OMRFE method is put forward to process the colorectal cancer integrated data which includes multiple genomic data: copy number alterations, somatic mutations, methylation expression alteration, as well as gene expression changes. Experimental results demonstrate that the OMBRFE is more effective than previous methods in identifying the feature genes. Moreover, genes identified by OMBRFE are verified to be closely associated with advanced colorectal cancer in clinical stage.
Collapse
Affiliation(s)
- Jian Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yuhu Cheng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xuesong Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
5
|
Wang YX, Gao YL, Liu JX, Kong XZ, Li HJ. Robust Principal Component Analysis Regularized by Truncated Nuclear Norm for Identifying Differentially Expressed Genes. IEEE Trans Nanobioscience 2017; 16:447-454. [PMID: 28692983 DOI: 10.1109/tnb.2017.2723439] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Identifying differentially expressed genes from the thousands of genes is a challenging task. Robust principal component analysis (RPCA) is an efficient method in the identification of differentially expressed genes. RPCA method uses nuclear norm to approximate the rank function. However, theoretical studies showed that the nuclear norm minimizes all singular values, so it may not be the best solution to approximate the rank function. The truncated nuclear norm is defined as the sum of some smaller singular values, which may achieve a better approximation of the rank function than nuclear norm. In this paper, a novel method is proposed by replacing nuclear norm of RPCA with the truncated nuclear norm, which is named robust principal component analysis regularized by truncated nuclear norm (TRPCA). The method decomposes the observation matrix of genomic data into a low-rank matrix and a sparse matrix. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Thus, the differentially expressed genes can be identified according to the sparse matrix. The experimental results on The Cancer Genome Atlas data illustrate that the TRPCA method outperforms other state-of-the-art methods in the identification of differentially expressed genes.
Collapse
|
6
|
Kong XZ, Liu JX, Zheng CH, Hou MX, Wang J. Robust and Efficient Biomolecular Clustering of Tumor Based on ${p}$ -Norm Singular Value Decomposition. IEEE Trans Nanobioscience 2017; 16:341-348. [PMID: 28541216 DOI: 10.1109/tnb.2017.2705983] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.
Collapse
|
7
|
Wang D, Liu JX, Gao YL, Zheng CH, Xu Y. Characteristic Gene Selection Based on Robust Graph Regularized Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1059-1067. [PMID: 26672047 DOI: 10.1109/tcbb.2015.2505294] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Many methods have been considered for gene selection and analysis of gene expression data. Nonetheless, there still exists the considerable space for improving the explicitness and reliability of gene selection. To this end, this paper proposes a novel method named robust graph regularized non-negative matrix factorization for characteristic gene selection using gene expression data, which mainly contains two aspects: Firstly, enforcing L21-norm minimization on error function which is robust to outliers and noises in data points. Secondly, it considers that the samples lie in low-dimensional manifold which embeds in a high-dimensional ambient space, and reveals the data geometric structure embedded in the original data. To demonstrate the validity of the proposed method, we apply it to gene expression data sets involving various human normal and tumor tissue samples and the results demonstrate that the method is effective and feasible.
Collapse
|
8
|
Wang YX, Liu JX, Gao YL, Zheng CH, Shang JL. Differentially expressed genes selection via Laplacian regularized low-rank representation method. Comput Biol Chem 2016; 65:185-192. [PMID: 27693191 DOI: 10.1016/j.compbiolchem.2016.09.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 09/22/2016] [Accepted: 09/22/2016] [Indexed: 10/20/2022]
Abstract
With the rapid development of DNA microarray technology and next-generation technology, a large number of genomic data were generated. So how to extract more differentially expressed genes from genomic data has become a matter of urgency. Because Low-Rank Representation (LRR) has the high performance in studying low-dimensional subspace structures, it has attracted a chunk of attention in recent years. However, it does not take into consideration the intrinsic geometric structures in data. In this paper, a new method named Laplacian regularized Low-Rank Representation (LLRR) has been proposed and applied on genomic data, which introduces graph regularization into LRR. By taking full advantages of the graph regularization, LLRR method can capture the intrinsic non-linear geometric information among the data. The LLRR method can decomposes the observation matrix of genomic data into a low rank matrix and a sparse matrix through solving an optimization problem. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Therefore, the differentially expressed genes can be selected according to the sparse matrix. Finally, we use the GO tool to analyze the selected genes and compare the P-values with other methods. The results on the simulation data and two real genomic data illustrate that this method outperforms some other methods: in differentially expressed gene selection.
Collapse
Affiliation(s)
- Ya-Xuan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, 276826, China.
| | - Chun-Hou Zheng
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Jun-Liang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|
9
|
An NMF-L2,1-Norm Constraint Method for Characteristic Gene Selection. PLoS One 2016; 11:e0158494. [PMID: 27428058 PMCID: PMC4948826 DOI: 10.1371/journal.pone.0158494] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 06/16/2016] [Indexed: 11/30/2022] Open
Abstract
Recent research has demonstrated that characteristic gene selection based on gene expression data remains faced with considerable challenges. This is primarily because gene expression data are typically high dimensional, negative, non-sparse and noisy. However, existing methods for data analysis are able to cope with only some of these challenges. In this paper, we address all of these challenges with a unified method: nonnegative matrix factorization via the L2,1-norm (NMF-L2,1). While L2,1-norm minimization is applied to both the error function and the regularization term, our method is robust to outliers and noise in the data and generates sparse results. The application of our method to plant and tumor gene expression data demonstrates that NMF-L2,1 can extract more characteristic genes than other existing state-of-the-art methods.
Collapse
|
10
|
Liu J, Liu JX, Gao YL, Kong XZ, Wang XS, Wang D. A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes. PLoS One 2015. [PMID: 26201006 PMCID: PMC4511795 DOI: 10.1371/journal.pone.0133124] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data.
Collapse
Affiliation(s)
- Jian Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, Shandong, China
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221000, Jiangsu, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, Shandong, China
- Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, Guangdong, China
- * E-mail:
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Xiang-Zhen Kong
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, Shandong, China
| | - Xue-Song Wang
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221000, Jiangsu, China
- The Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation Chinese Academy of Sciences, Beijing, 100000, China
| | - Dong Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, Shandong, China
| |
Collapse
|
11
|
Liu JX, Xu Y, Zheng CH, Kong H, Lai ZH. RPCA-Based Tumor Classification Using Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:964-970. [PMID: 26357336 DOI: 10.1109/tcbb.2014.2383375] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Microarray techniques have been used to delineate cancer groups or to identify candidate genes for cancer prognosis. As such problems can be viewed as classification ones, various classification methods have been applied to analyze or interpret gene expression data. In this paper, we propose a novel method based on robust principal component analysis (RPCA) to classify tumor samples of gene expression data. Firstly, RPCA is utilized to highlight the characteristic genes associated with a special biological process. Then, RPCA and RPCA+LDA (robust principal component analysis and linear discriminant analysis) are used to identify the features. Finally, support vector machine (SVM) is applied to classify the tumor samples of gene expression data based on the identified features. Experiments on seven data sets demonstrate that our methods are effective and feasible for tumor classification.
Collapse
|
12
|
Liu JX, Liu J, Gao YL, Mi JX, Ma CX, Wang D. A class-information-based penalized matrix decomposition for identifying plants core genes responding to abiotic stresses. PLoS One 2014; 9:e106097. [PMID: 25180509 PMCID: PMC4152128 DOI: 10.1371/journal.pone.0106097] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/29/2014] [Indexed: 12/03/2022] Open
Abstract
In terms of making genes expression data more interpretable and comprehensible, there exists a significant superiority on sparse methods. Many sparse methods, such as penalized matrix decomposition (PMD) and sparse principal component analysis (SPCA), have been applied to extract plants core genes. Supervised algorithms, especially the support vector machine-recursive feature elimination (SVM-RFE) method, always have good performance in gene selection. In this paper, we draw into class information via the total scatter matrix and put forward a class-information-based penalized matrix decomposition (CIPMD) method to improve the gene identification performance of PMD-based method. Firstly, the total scatter matrix is obtained based on different samples of the gene expression data. Secondly, a new data matrix is constructed by decomposing the total scatter matrix. Thirdly, the new data matrix is decomposed by PMD to obtain the sparse eigensamples. Finally, the core genes are identified according to the nonzero entries in eigensamples. The results on simulation data show that CIPMD method can reach higher identification accuracies than the conventional gene identification methods. Moreover, the results on real gene expression data demonstrate that CIPMD method can identify more core genes closely related to the abiotic stresses than the other methods.
Collapse
Affiliation(s)
- Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
- Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail:
| | - Jian Liu
- School of Communication, Qufu Normal University, Rizhao, Shandong, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, Shandong, China
| | - Jian-Xun Mi
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Chun-Xia Ma
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
| | - Dong Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
| |
Collapse
|
13
|
Liu JX, Gao YL, Xu Y, Zheng CH, You J. Differential Expression Analysis on RNA-Seq Count Data Based on Penalized Matrix Decomposition. IEEE Trans Nanobioscience 2014; 13:12-8. [DOI: 10.1109/tnb.2013.2296978] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
14
|
Liu JX, Wang YT, Zheng CH, Sha W, Mi JX, Xu Y. Robust PCA based method for discovering differentially expressed genes. BMC Bioinformatics 2013; 14 Suppl 8:S3. [PMID: 23815087 PMCID: PMC3654929 DOI: 10.1186/1471-2105-14-s8-s3] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
How to identify a set of genes that are relevant to a key biological process is an important issue in current molecular biology. In this paper, we propose a novel method to discover differentially expressed genes based on robust principal component analysis (RPCA). In our method, we treat the differentially and non-differentially expressed genes as perturbation signals S and low-rank matrix A, respectively. Perturbation signals S can be recovered from the gene expression data by using RPCA. To discover the differentially expressed genes associated with special biological progresses or functions, the scheme is given as follows. Firstly, the matrix D of expression data is decomposed into two adding matrices A and S by using RPCA. Secondly, the differentially expressed genes are identified based on matrix S. Finally, the differentially expressed genes are evaluated by the tools based on Gene Ontology. A larger number of experiments on hypothetical and real gene expression data are also provided and the experimental results show that our method is efficient and effective.
Collapse
Affiliation(s)
- Jin-Xing Liu
- Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
| | | | | | | | | | | |
Collapse
|
15
|
Characteristic gene selection via weighting principal components by singular values. PLoS One 2012; 7:e38873. [PMID: 22808018 PMCID: PMC3393749 DOI: 10.1371/journal.pone.0038873] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 05/13/2012] [Indexed: 12/22/2022] Open
Abstract
Conventional gene selection methods based on principal component analysis (PCA) use only the first principal component (PC) of PCA or sparse PCA to select characteristic genes. These methods indeed assume that the first PC plays a dominant role in gene selection. However, in a number of cases this assumption is not satisfied, so the conventional PCA-based methods usually provide poor selection results. In order to improve the performance of the PCA-based gene selection method, we put forward the gene selection method via weighting PCs by singular values (WPCS). Because different PCs have different importance, the singular values are exploited as the weights to represent the influence on gene selection of different PCs. The ROC curves and AUC statistics on artificial data show that our method outperforms the state-of-the-art methods. Moreover, experimental results on real gene expression data sets show that our method can extract more characteristic genes in response to abiotic stresses than conventional gene selection methods.
Collapse
|