1
|
Yassi M, Moattar MH, Parry M, Chatterjee A. Enhancing Robust and Stable Feature Selection Through the Integration of Ranking Methods and Wrapper Techniques in Genetic Data Classification. Methods Mol Biol 2025; 2880:243-254. [PMID: 39900763 DOI: 10.1007/978-1-0716-4276-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
High-dimensional data expands the spatial dimension, leading to increased computational complexity and reduced generalization performance. Microarray data classification, such as diagnosing diseases like cancer, involves complex dimensions due to their genetic and biological information. To address this issue, dimension reduction is essential for these data sets. The main goal of this chapter is to provide a method for dimension reduction and classification of genetic data sets. The proposed approach comprises multiple stages. Initially, various feature ranking methods are combined to improve the robustness and stability of the feature selection process. A hybrid ranking method, which incorporates gene interactions, is integrated with a wrapper method. Subsequently, a support vector machine (SVM) is employed for classification. To address class imbalance in the training data, a solution is implemented before feeding the data into the SVM classifier. The experimental outcomes of the proposed approach, tested on five microarray databases, indicate robust feature selection with a metric ranging from 0.70 to 0.88. Additionally, the classification accuracy falls within the range of 91-96%.
Collapse
Affiliation(s)
- Maryam Yassi
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | | | - Matthew Parry
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Te Pūnaha Matatini Centre of Research Excellence, University of Auckland, Auckland, New Zealand
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand.
- UPES University, Dehradun, India.
| |
Collapse
|
2
|
A new hybrid algorithm for three-stage gene selection based on whale optimization. Sci Rep 2023; 13:3783. [PMID: 36882446 PMCID: PMC9992521 DOI: 10.1038/s41598-023-30862-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 03/02/2023] [Indexed: 03/09/2023] Open
Abstract
In biomedical data mining, the gene dimension is often much larger than the sample size. To solve this problem, we need to use a feature selection algorithm to select feature gene subsets with a strong correlation with phenotype to ensure the accuracy of subsequent analysis. This paper presents a new three-stage hybrid feature gene selection method, that combines a variance filter, extremely randomized tree, and whale optimization algorithm. First, a variance filter is used to reduce the dimension of the feature gene space, and an extremely randomized tree is used to further reduce the feature gene set. Finally, the whale optimization algorithm is used to select the optimal feature gene subset. We evaluate the proposed method with three different classifiers in seven published gene expression profile datasets and compare it with other advanced feature selection algorithms. The results show that the proposed method has significant advantages in a variety of evaluation indicators.
Collapse
|
3
|
Liu J, Feng H, Tang Y, Zhang L, Qu C, Zeng X, Peng X. A novel hybrid algorithm based on Harris Hawks for tumor feature gene selection. PeerJ Comput Sci 2023; 9:e1229. [PMID: 37346505 PMCID: PMC10280456 DOI: 10.7717/peerj-cs.1229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/09/2023] [Indexed: 06/23/2023]
Abstract
Background Gene expression data are often used to classify cancer genes. In such high-dimensional datasets, however, only a few feature genes are closely related to tumors. Therefore, it is important to accurately select a subset of feature genes with high contributions to cancer classification. Methods In this article, a new three-stage hybrid gene selection method is proposed that combines a variance filter, extremely randomized tree and Harris Hawks (VEH). In the first stage, we evaluated each gene in the dataset through the variance filter and selected the feature genes that meet the variance threshold. In the second stage, we use extremely randomized tree to further eliminate irrelevant genes. Finally, we used the Harris Hawks algorithm to select the gene subset from the previous two stages to obtain the optimal feature gene subset. Results We evaluated the proposed method using three different classifiers on eight published microarray gene expression datasets. The results showed a 100% classification accuracy for VEH in gastric cancer, acute lymphoblastic leukemia and ovarian cancer, and an average classification accuracy of 95.33% across a variety of other cancers. Compared with other advanced feature selection algorithms, VEH has obvious advantages when measured by many evaluation criteria.
Collapse
Affiliation(s)
- Junjian Liu
- Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China
| | - Huicong Feng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China
| | - Lupeng Zhang
- Department of Biochemistry and Molecular Biology, Jishou University School of Medicine, Jishou, Hunan, China
| | - Chiwen Qu
- Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, Changsha, Hunan, China
| | - Xiaoning Peng
- Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China
| |
Collapse
|
4
|
Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]
|
5
|
Performance Analysis of Deep Learning Models for Binary Classification of Cancer Gene Expression Data. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:1122536. [PMID: 35310177 PMCID: PMC8926523 DOI: 10.1155/2022/1122536] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 02/09/2022] [Indexed: 11/23/2022]
Abstract
The classification of patients as cancer and normal patients by applying the computational methods on their gene expression profiles is an extremely important task. Recently, deep learning models, mainly multilayer perceptron and convolutional neural networks, have gained popularity for being applied on this type of datasets. This paper aims to analyze the performance of deep learning models on different types of cancer gene expression datasets as no such consolidated work is available. For this purpose, three deep learning models along with two feature selection method and four cancer gene expression datasets have been considered. It has resulted in a total of 24 different combinations to be analyzed. Out of four datasets, two are imbalanced and two are balanced in terms of number of normal and cancer samples. Experimental results show that the deep learning models have performed well in terms of true positive rate, precision, F1-score, and accuracy.
Collapse
|
6
|
Dhal P, Azad C. A multi-objective feature selection method using Newton’s law based PSO with GWO. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107394] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Al-Rajab M, Lu J, Xu Q. A framework model using multifilter feature selection to enhance colon cancer classification. PLoS One 2021; 16:e0249094. [PMID: 33861766 PMCID: PMC8691854 DOI: 10.1371/journal.pone.0249094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/11/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiles can be utilized in the diagnosis of critical diseases such as cancer. The selection of biomarker genes from these profiles is significant and crucial for cancer detection. This paper presents a framework proposing a two-stage multifilter hybrid model of feature selection for colon cancer classification. Colon cancer is being extremely common nowadays among other types of cancer. There is a need to find fast and an accurate method to detect the tissues, and enhance the diagnostic process and the drug discovery. This paper reports on a study whose objective has been to improve the diagnosis of cancer of the colon through a two-stage, multifilter model of feature selection. The model described deals with feature selection using a combination of Information Gain and a Genetic Algorithm. The next stage is to filter and rank the genes identified through this method using the minimum Redundancy Maximum Relevance (mRMR) technique. The final phase is to further analyze the data using correlated machine learning algorithms. This two-stage approach, which involves the selection of genes before classification techniques are used, improves success rates for the identification of cancer cells. It is found that Decision Tree, K-Nearest Neighbor, and Naïve Bayes classifiers had showed promising accurate results using the developed hybrid framework model. It is concluded that the performance of our proposed method has achieved a higher accuracy in comparison with the existing methods reported in the literatures. This study can be used as a clue to enhance treatment and drug discovery for the colon cancer cure.
Collapse
Affiliation(s)
- Murad Al-Rajab
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| |
Collapse
|
8
|
Pasupa K, Rathasamuth W, Tongsima S. Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique. BMC Bioinformatics 2020; 21:216. [PMID: 32456608 PMCID: PMC7251909 DOI: 10.1186/s12859-020-3471-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 03/25/2020] [Indexed: 11/21/2022] Open
Abstract
Background The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. Results The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. Conclusions The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds.
Collapse
Affiliation(s)
- Kitsuchart Pasupa
- Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand.
| | - Wanthanee Rathasamuth
- Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
| | - Sissades Tongsima
- National Biobank of Thailand, National Science and Technology Development Agency, Khong Luang, 12120, Thailand
| |
Collapse
|
9
|
Zhao Q, Zhang Y. Ensemble Method of Feature Selection and Reverse Construction of Gene Logical Network Based on Information Entropy. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001420590041] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we propose a novel ensemble gene selection method to obtain a gene subset. Then we provide a reverse construction method of gene network derived from expression profile data of the gene subset. The uncertainty coefficient based on information entropy are used to define the existence of logical relations among these genes. If the uncertainty coefficient between some genes exceeds predefined thresholds, the gene nodes will be connected by directed edges. Thus, a gene network is generated, which we define as gene logical network. This method is applied to the breast cancer data including control group and experimental group, with comparisons of the 2nd-order logic type distribution, average degree as well as average path length of the networks. It is found that these structures with different networks are quite distinct. By the comparison of the degree difference between control group and experimental group, the key genes are picked up. By defining the dynamics evolution rules of state transition based on the logical regulation among the key genes in the network, the dynamic behaviors for normal breast cells and cells with cancer of different stages are simulated numerically. Some of them are highly related to the development of breast cancer through literature inquiry. The study may provide a useful revelation to the biological mechanism in the formation and development of cancer.
Collapse
Affiliation(s)
- Qingfeng Zhao
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
- Shandong Province Key Laboratory of Wisdom Mine Information Technology, Shandong University of Science and Technology, Qingdao 266590, P. R. China
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, Shandong 266590, P. R. China
| |
Collapse
|
10
|
Hong CF, Chen YC, Chen WC, Tu KC, Tsai MH, Chan YK, Yu SS. Construction of diagnosis system and gene regulatory networks based on microarray analysis. J Biomed Inform 2018; 81:61-73. [PMID: 29550394 DOI: 10.1016/j.jbi.2018.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/30/2018] [Accepted: 03/12/2018] [Indexed: 01/02/2023]
Abstract
A microarray analysis generally contains expression data of thousands of genes, but most of them are irrelevant to the disease of interest, making analyzing the genes concerning specific diseases complicated. Therefore, filtering out a few essential genes as well as their regulatory networks is critical, and a disease can be easily diagnosed just depending on the expression profiles of a few critical genes. In this study, a target gene screening (TGS) system, which is a microarray-based information system that integrates F-statistics, pattern recognition matching, a two-layer K-means classifier, a Parameter Detection Genetic Algorithm (PDGA), a genetic-based gene selector (GBG selector) and the association rule, was developed to screen out a small subset of genes that can discriminate malignant stages of cancers. During the first stage, F-statistic, pattern recognition matching, and a two-layer K-means classifier were applied in the system to filter out the 20 critical genes most relevant to ovarian cancer from 9600 genes, and the PDGA was used to decide the fittest values of the parameters for these critical genes. Among the 20 critical genes, 15 are associated with cancer progression. In the second stage, we further employed a GBG selector and the association rule to screen out seven target gene sets, each with only four to six genes, and each of which can precisely identify the malignancy stage of ovarian cancer based on their expression profiles. We further deduced the gene regulatory networks of the 20 critical genes by applying the Pearson correlation coefficient to evaluate the correlationship between the expression of each gene at the same stages and at different stages. Correlationships between gene pairs were calculated, and then, three regulatory networks were deduced. Their correlationships were further confirmed by the Ingenuity pathway analysis. The prognostic significances of the genes identified via regulatory networks were examined using online tools, and most represented biomarker candidates. In summary, our proposed system provides a new strategy to identify critical genes or biomarkers, as well as their regulatory networks, from microarray data.
Collapse
Affiliation(s)
- Chun-Fu Hong
- Department of Long-Term Care, National Quemoy University, Kinmen County 892, Taiwan, ROC
| | - Ying-Chen Chen
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| | - Wei-Chun Chen
- Department of Management Information System, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| | - Keng-Chang Tu
- Deparment of Computer Science and Engineering, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| | - Meng-Hsiun Tsai
- Department of Management Information System, National Chung Hsing University, Taichung City 402, Taiwan, ROC.
| | - Yung-Kuan Chan
- Department of Management Information System, National Chung Hsing University, Taichung City 402, Taiwan, ROC.
| | - Shyr Shen Yu
- Deparment of Computer Science and Engineering, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| |
Collapse
|
11
|
Yassi M, Shams Davodly E, Mojtabanezhad Shariatpanahi A, Heidari M, Dayyani M, Heravi-Moussavi A, Moattar MH, Kerachian MA. DMRFusion: A differentially methylated region detection tool based on the ranked fusion method. Genomics 2018; 110:366-374. [PMID: 29309841 DOI: 10.1016/j.ygeno.2017.12.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/05/2017] [Accepted: 12/11/2017] [Indexed: 12/11/2022]
Abstract
DNA methylation is an important epigenetic modification involved in many biological processes and diseases. Computational analysis of differentially methylated regions (DMRs) could explore the underlying reasons of methylation. DMRFusion is presented as a useful tool for comprehensive DNA methylation analysis of DMRs on methylation sequencing data. This tool is designed base on the integration of several ranking methods; Information gain, Between versus within Class scatter ratio, Fisher ratio, Z-score and Welch's t-test. In this study, DMRFusion on reduced representation bisulfite sequencing (RRBS) data in chronic lymphocytic leukemia cancer displayed 30 nominated regions and CpG sites with a maximum methylation difference detected in the hypermethylation DMRs. We realized that DMRFusion is able to process methylation sequencing data in an efficient and accurate manner and to provide annotation and visualization for DMRs with high fold difference score (p-value and FDR<0.05 and type I error: 0.04).
Collapse
Affiliation(s)
- Maryam Yassi
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran
| | - Ehsan Shams Davodly
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran
| | | | - Mehdi Heidari
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran
| | - Mahdieh Dayyani
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran
| | - Alireza Heravi-Moussavi
- Canada's Michael Smith Genome Sciences Center, BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | - Mohammad Amin Kerachian
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran; Cancer Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
12
|
Shahrjooihaghighi A, Frigui H, Zhang X, Wei X, Shi B, Trabelsi A. An Ensemble Feature Selection Method for Biomarker Discovery. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2017; 2017:416-421. [PMID: 30887013 PMCID: PMC6420823 DOI: 10.1109/isspit.2017.8388679] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Feature selection in Liquid Chromatography-Mass Spectrometry (LC-MS)-based metabolomics data (biomarker discovery) have become an important topic for machine learning researchers. High dimensionality and small sample size of LC-MS data make feature selection a challenging task. The goal of biomarker discovery is to select the few most discriminative features among a large number of irreverent ones. To improve the reliability of the discovered biomarkers, we use an ensemble-based approach. Ensemble learning can improve the accuracy of feature selection by combining multiple algorithms that have complementary information. In this paper, we propose an ensemble approach to combine the results of filter-based feature selection methods. To evaluate the proposed approach, we compared it to two commonly used methods, t-test and PLS-DA, using a real data set.
Collapse
Affiliation(s)
- Aliasghar Shahrjooihaghighi
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
| | - Hichem Frigui
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
| | - Xiang Zhang
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| | - Xiaoli Wei
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| | - Biyun Shi
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| | - Ameni Trabelsi
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
13
|
Du W, Cao Z, Song T, Li Y, Liang Y. A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min 2017; 10:4. [PMID: 28184251 PMCID: PMC5288949 DOI: 10.1186/s13040-017-0124-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 01/11/2017] [Indexed: 11/28/2022] Open
Abstract
Background With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification. Results In this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set. Conclusions We not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements. Electronic supplementary material The online version of this article (doi:10.1186/s13040-017-0124-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Du
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Zhongbo Cao
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.,School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, 130012 China
| | - Tianci Song
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Yanchun Liang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.,Zhuhai Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai, 519041 China
| |
Collapse
|
14
|
Colak D, Alaiya AA, Kaya N, Muiya NP, AlHarazi O, Shinwari Z, Andres E, Dzimiri N. Integrated Left Ventricular Global Transcriptome and Proteome Profiling in Human End-Stage Dilated Cardiomyopathy. PLoS One 2016; 11:e0162669. [PMID: 27711126 PMCID: PMC5053516 DOI: 10.1371/journal.pone.0162669] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Accepted: 08/28/2016] [Indexed: 01/30/2023] Open
Abstract
Aims The disease pathways leading to idiopathic dilated cardiomyopathy (DCM) are still elusive. The present study investigated integrated global transcriptional and translational changes in human DCM for disease biomarker discovery. Methods We used identical myocardial tissues from five DCM hearts compared to five non-failing (NF) donor hearts for both transcriptome profiling using the ABI high-density oligonucleotide microarrays and proteome expression with One-Dimensional Nano Acquity liquid chromatography coupled with tandem mass spectrometry on the Synapt G2 system. Results We identified 1262 differentially expressed genes (DEGs) and 269 proteins (DEPs) between DCM cases and healthy controls. Among the most significantly upregulated (>5-fold) proteins were GRK5, APOA2, IGHG3, ANXA6, HSP90AA1, and ATP5C1 (p< 0.01). On the other hand, the most significantly downregulated proteins were GSTM5, COX17, CAV1 and ANXA3. At least ten entities were concomitantly upregulated on the two analysis platforms: GOT1, ALDH4A1, PDHB, BDH1, SLC2A11, HSP90AA1, HSP90AB1, H2AFV, HSPA5 and NDUFV1. Gene ontology analyses of DEGs and DEPs revealed significant overlap with enrichment of genes/proteins related to metabolic process, biosynthetic process, cellular component organization, oxidative phosphorylation, alterations in glycolysis and ATP synthesis, Alzheimer’s disease, chemokine-mediated inflammation and cytokine signalling pathways. Conclusion The concomitant use of transcriptome and proteome expression to evaluate global changes in DCM has led to the identification of sixteen commonly altered entities as well as novel genes, proteins and pathways whose cardiac functions have yet to be deciphered. This data should contribute towards better management of the disease.
Collapse
Affiliation(s)
- Dilek Colak
- Biostatistics, Epidemiology and Scientific Computing Department, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Ayodele A. Alaiya
- Proteomics Unit, Stem Cell Tissue Re-Engineering Program, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Namik Kaya
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Nzioka P. Muiya
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Olfat AlHarazi
- Biostatistics, Epidemiology and Scientific Computing Department, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Zakia Shinwari
- Proteomics Unit, Stem Cell Tissue Re-Engineering Program, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Editha Andres
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
| | - Nduna Dzimiri
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, 11211, Saudi Arabia
- * E-mail:
| |
Collapse
|
15
|
Yassi M, Moattar MH. Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification. Biochem Biophys Res Commun 2014; 446:850-6. [DOI: 10.1016/j.bbrc.2014.02.146] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 02/27/2014] [Indexed: 10/25/2022]
|
16
|
|
17
|
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowé A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1106-19. [PMID: 22350210 DOI: 10.1109/tcbb.2012.33] [Citation(s) in RCA: 219] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.
Collapse
Affiliation(s)
- Cosmin Lazar
- Computational Modeling Group, Department of Computer Science, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Hiissa J, Elo LL, Huhtinen K, Perheentupa A, Poutanen M, Aittokallio T. Resampling reveals sample-level differential expression in clinical genome-wide studies. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:381-96. [PMID: 19663710 DOI: 10.1089/omi.2009.0027] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genome-scale molecular profiling of clinical sample material often results in heterogeneous datasets beyond the capability of standard statistical procedures. Statistical tests for differential expression, in particular, rely upon the assumption that the sample groups being compared are relatively homogeneous. Such assumption rarely holds in clinical materials, which leads to detection of secondary findings (false positives) or loss of significant targets (false negatives). Here, we introduce a resampling-based procedure, named ReScore, which aggregates individual changes across all the samples while preserving their clinical classes, and thereby provides multiple sets of markers that can effectively characterize distinct sample subsets. When applied to a public leukemia microarray study, the procedure could accurately reveal hidden subgroup structures associated with underlying genotypic abnormalities. The procedure improved both the sensitivity and specificity of the findings, as well as helped us to identify several disease subtype-specific genes that have remained undetected in the conventional analyses. In our endometriosis study, we were able to accurately distinguish between various sources of systematic variation, linked, for example, to tissue-specificity and disease-related factors, many of which would have been missed with standard approaches. The generic procedure should benefit also other global profiling experiments such as those based on mass spectrometry-based proteomic assays.
Collapse
Affiliation(s)
- Jukka Hiissa
- Biomathematics Research Group, Department of Mathematics, University of Turku, Turku, Finland
| | | | | | | | | | | |
Collapse
|
19
|
Colak D, Kaya N, Al-Zahrani J, Al Bakheet A, Muiya P, Andres E, Quackenbush J, Dzimiri N. Left ventricular global transcriptional profiling in human end-stage dilated cardiomyopathy. Genomics 2009; 94:20-31. [PMID: 19332114 PMCID: PMC4152850 DOI: 10.1016/j.ygeno.2009.03.003] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2008] [Revised: 02/17/2009] [Accepted: 03/17/2009] [Indexed: 02/07/2023]
Abstract
We employed ABI high-density oligonucleotide microarrays containing 31,700 sixty-mer probes (representing 27,868 annotated human genes) to determine differential gene expression in idiopathic dilated cardiomyopathy (DCM). We identified 626 up-regulated and 636 down-regulated genes in DCM compared to controls. Most significant changes occurred in the tricarboxylic acid cycle, angiogenesis, and apoptotic signaling pathways, among which 32 apoptosis- and 13 MAPK activity-related genes were altered. Inorganic cation transporter, catalytic activities, energy metabolism and electron transport-related processes were among the most critically influenced pathways. Among the up-regulated genes were HTRA1 (6.9-fold), PDCD8(AIFM1) (5.2) and PRDX2 (4.4) and the down-regulated genes were NR4A2 (4.8), MX1 (4.3), LGALS9 (4), IFNA13 (4), UNC5D (3.6) and HDAC2 (3) (p<0.05), all of which have no clearly defined cardiac-related function yet. Gene ontology and enrichment analysis also revealed significant alterations in mitochondrial oxidative phosphorylation, metabolism and Alzheimer's disease pathways. Concordance was also confirmed for a significant number of genes and pathways in an independent validation microarray dataset. Furthermore, verification by real-time RT-PCR showed a high degree of consistency with the microarray results. Our data demonstrate an association of DCM with alterations in various cellular events and multiple yet undeciphered genes that may contribute to heart muscle disease pathways.
Collapse
Affiliation(s)
- Dilek Colak
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Namik Kaya
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, P.O. Box 3354, Riyadh 11211, Saudi Arabia
| | - Jawaher Al-Zahrani
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, P.O. Box 3354, Riyadh 11211, Saudi Arabia
| | - Albandary Al Bakheet
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, P.O. Box 3354, Riyadh 11211, Saudi Arabia
| | - Paul Muiya
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, P.O. Box 3354, Riyadh 11211, Saudi Arabia
| | - Editha Andres
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, P.O. Box 3354, Riyadh 11211, Saudi Arabia
| | - John Quackenbush
- Department of Biostatistics and Computational Biology; Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nduna Dzimiri
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, P.O. Box 3354, Riyadh 11211, Saudi Arabia
| |
Collapse
|