1
|
Yaqoob A, Verma NK, Aziz RM, Shah MA. RNA-Seq analysis for breast cancer detection: a study on paired tissue samples using hybrid optimization and deep learning techniques. J Cancer Res Clin Oncol 2024; 150:455. [PMID: 39390265 PMCID: PMC11467072 DOI: 10.1007/s00432-024-05968-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 09/21/2024] [Indexed: 10/12/2024]
Abstract
PROBLEM Breast cancer is a leading global health issue, contributing to high mortality rates among women. The challenge of early detection is exacerbated by the high dimensionality and complexity of gene expression data, which complicates the classification process. AIM This study aims to develop an advanced deep learning model that can accurately detect breast cancer using RNA-Seq gene expression data, while effectively addressing the challenges posed by the data's high dimensionality and complexity. METHODS We introduce a novel hybrid gene selection approach that combines the Harris Hawk Optimization (HHO) and Whale Optimization (WO) algorithms with deep learning to improve feature selection and classification accuracy. The model's performance was compared to five conventional optimization algorithms integrated with deep learning: Genetic Algorithm (GA), Artificial Bee Colony (ABC), Cuckoo Search (CS), and Particle Swarm Optimization (PSO). RNA-Seq data was collected from 66 paired samples of normal and cancerous tissues from breast cancer patients at the Jawaharlal Nehru Cancer Hospital & Research Centre, Bhopal, India. Sequencing was performed by Biokart Genomics Lab, Bengaluru, India. RESULTS The proposed model achieved a mean classification accuracy of 99.0%, consistently outperforming the GA, ABC, CS, and PSO methods. The dataset comprised 55 female breast cancer patients, including both early and advanced stages, along with age-matched healthy controls. CONCLUSION Our findings demonstrate that the hybrid gene selection approach using HHO and WO, combined with deep learning, is a powerful and accurate tool for breast cancer detection. This approach shows promise for early detection and could facilitate personalized treatment strategies, ultimately improving patient outcomes.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Science and Language, VIT Bhopal University, Kothrikalan, Sehore, Bhopal, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Science and Language, VIT Bhopal University, Kothrikalan, Sehore, Bhopal, 466114, India
| | - Rabia Musheer Aziz
- Planning Department, State Planning Institute (New Division), Lucknow, Utter Pradesh, 226001, India
| | - Mohd Asif Shah
- Department of Economics, Kardan University, Parwane Du, 1001, Kabul, Afghanistan.
- Division of Research and Development, Lovely Professional University, Phagwara, Punjab, 144001, India.
- Centre of Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, India.
| |
Collapse
|
2
|
Yaqoob A, Verma NK, Aziz RM. Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm. J Med Syst 2024; 48:10. [PMID: 38193948 DOI: 10.1007/s10916-023-02031-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/28/2023] [Indexed: 01/10/2024]
Abstract
Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| | - Rabia Musheer Aziz
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| |
Collapse
|
3
|
McKearnan SB, Vock DM, Marai GE, Canahuate G, Fuller CD, Wolfson J. Feature selection for support vector regression using a genetic algorithm. Biostatistics 2023; 24:295-308. [PMID: 34494086 PMCID: PMC10102886 DOI: 10.1093/biostatistics/kxab022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 02/12/2021] [Accepted: 03/21/2021] [Indexed: 11/13/2022] Open
Abstract
Support vector regression (SVR) is particularly beneficial when the outcome and predictors are nonlinearly related. However, when many covariates are available, the method's flexibility can lead to overfitting and an overall loss in predictive accuracy. To overcome this drawback, we develop a feature selection method for SVR based on a genetic algorithm that iteratively searches across potential subsets of covariates to find those that yield the best performance according to a user-defined fitness function. We evaluate the performance of our feature selection method for SVR, comparing it to alternate methods including LASSO and random forest, in a simulation study. We find that our method yields higher predictive accuracy than SVR without feature selection. Our method outperforms LASSO when the relationship between covariates and outcome is nonlinear. Random forest performs equivalently to our method in some scenarios, but more poorly when covariates are correlated. We apply our method to predict donor kidney function 1 year after transplant using data from the United Network for Organ Sharing national registry.
Collapse
Affiliation(s)
- Shannon B McKearnan
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN 55414, USA
| | - David M Vock
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN 55414, USA
| | - G Elisabeta Marai
- Department of Computer Science, The University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Guadalupe Canahuate
- Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242, USA
| | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Julian Wolfson
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55414, USA
| |
Collapse
|
4
|
Kwan B, Fuhrer T, Montemayor D, Fink JC, He J, Hsu CY, Messer K, Nelson RG, Pu M, Ricardo AC, Rincon-Choles H, Shah VO, Ye H, Zhang J, Sharma K, Natarajan L. A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study. BMC Bioinformatics 2023; 24:57. [PMID: 36803209 PMCID: PMC9942303 DOI: 10.1186/s12859-023-05171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
BACKGROUND The growing amount of high dimensional biomolecular data has spawned new statistical and computational models for risk prediction and disease classification. Yet, many of these methods do not yield biologically interpretable models, despite offering high classification accuracy. An exception, the top-scoring pair (TSP) algorithm derives parameter-free, biologically interpretable single pair decision rules that are accurate and robust in disease classification. However, standard TSP methods do not accommodate covariates that could heavily influence feature selection for the top-scoring pair. Herein, we propose a covariate-adjusted TSP method, which uses residuals from a regression of features on the covariates for identifying top scoring pairs. We conduct simulations and a data application to investigate our method, and compare it to existing classifiers, LASSO and random forests. RESULTS Our simulations found that features that were highly correlated with clinical variables had high likelihood of being selected as top scoring pairs in the standard TSP setting. However, through residualization, our covariate-adjusted TSP was able to identify new top scoring pairs, that were largely uncorrelated with clinical variables. In the data application, using patients with diabetes (n = 977) selected for metabolomic profiling in the Chronic Renal Insufficiency Cohort (CRIC) study, the standard TSP algorithm identified (valine-betaine, dimethyl-arg) as the top-scoring metabolite pair for classifying diabetic kidney disease (DKD) severity, whereas the covariate-adjusted TSP method identified the pair (pipazethate, octaethylene glycol) as top-scoring. Valine-betaine and dimethyl-arg had, respectively, ≥ 0.4 absolute correlation with urine albumin and serum creatinine, known prognosticators of DKD. Thus without covariate-adjustment the top-scoring pair largely reflected known markers of disease severity, whereas covariate-adjusted TSP uncovered features liberated from confounding, and identified independent prognostic markers of DKD severity. Furthermore, TSP-based methods achieved competitive classification accuracy in DKD to LASSO and random forests, while providing more parsimonious models. CONCLUSIONS We extended TSP-based methods to account for covariates, via a simple, easy to implement residualizing process. Our covariate-adjusted TSP method identified metabolite features, uncorrelated from clinical covariates, that discriminate DKD severity stage based on the relative ordering between two features, and thus provide insights into future studies on the order reversals in early vs advanced disease states.
Collapse
Grants
- U01 DK061028 NIDDK NIH HHS
- U01 DK060963 NIDDK NIH HHS
- R01DK118736, 1R01DK110541-01A1, U01DK060990, U01DK060984, U01DK061022, U01DK061021, U01DK061028, U01DK060980, U01DK060963, U01DK060902, U24DK060990 NIDDK NIH HHS
- R01 DK110541 NIDDK NIH HHS
- U01 DK060902 NIDDK NIH HHS
- U01 DK060990 NIDDK NIH HHS
- U01 DK060984 NIDDK NIH HHS
- U01 DK061021 NIDDK NIH HHS
- U24 DK060990 NIDDK NIH HHS
- U01 DK060980 NIDDK NIH HHS
- R01 DK118736 NIDDK NIH HHS
- U01 DK061022 NIDDK NIH HHS
- National Science Foundation Graduate Research Fellowship Program
- Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases
- National Institute of Diabetes and Digestive and Kidney Diseases
Collapse
Affiliation(s)
- Brian Kwan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Montemayor
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jeffery C Fink
- Department of Medicine, University of Maryland, Baltimore School of Medicine, Baltimore, MD, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine and Tulane University Translational Science Institute,, New Orleans, LA, USA
| | - Chi-Yuan Hsu
- Division of Nephrology, University of California, San Francisco School of Medicine, San Francisco, CA, USA
| | - Karen Messer
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Robert G Nelson
- Chronic Kidney Disease Section, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Minya Pu
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Ana C Ricardo
- Department of Medicine, University of Illinois, Chicago, IL, USA
| | - Hernan Rincon-Choles
- Department of Nephrology, Glickman Urological and Kidney Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Vallabh O Shah
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Hongping Ye
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jing Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Kumar Sharma
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Loki Natarajan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
5
|
A survey on gene expression data analysis using deep learning methods for cancer diagnosis. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2023; 177:1-13. [PMID: 35988771 DOI: 10.1016/j.pbiomolbio.2022.08.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/09/2022] [Accepted: 08/12/2022] [Indexed: 02/07/2023]
Abstract
Gene Expression Data is the biological data to extract meaningful hidden information from the gene dataset. This gene information is used for disease diagnosis especially in cancer treatment based on the variations in gene expression levels. DNA microarray is an efficient method for gene expression classification and prediction of cancer disease for specific types of cancer. Due to the abundance of computing power, deep learning (DL) has become a widespread technique in the healthcare sector. The gene expression dataset has a limited number of samples but a large number of features. Data augmentation is needed for gene expression datasets to overcome the dimensionality problem in gene data. It is a technique to generating the synthetic samples to increase the diversity of data. Deep learning methods are designed to learn and extract the features that come from the raw input data in the form of multidimensional arrays. This paper reviews the existing research in deep learning techniques like Feed Forward Neural Network (FFN), Convolutional Neural Network (CNN), Autoencoder (AE) and Recurrent Neural Network (RNN) for the classification and prediction of cancer disease and its types through gene expression data analysis.
Collapse
|
6
|
Bayrak T, Çetin Z, Saygılı Eİ, Ogul H. Identifying the tumor location-associated candidate genes in development of new drugs for colorectal cancer using machine-learning-based approach. Med Biol Eng Comput 2022; 60:2877-2897. [DOI: 10.1007/s11517-022-02641-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 07/28/2022] [Indexed: 02/07/2023]
|
7
|
Transcranial Magnetic Stimulation Indices of Cortical Excitability Enhance the Prediction of Response to Pharmacotherapy in Late-Life Depression. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2022; 7:265-275. [PMID: 34311121 PMCID: PMC8783923 DOI: 10.1016/j.bpsc.2021.07.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 07/14/2021] [Indexed: 11/23/2022]
Abstract
BACKGROUND Older adults with late-life depression (LLD) often experience incomplete or lack of response to first-line pharmacotherapy. The treatment of LLD could be improved using objective biological measures to predict response. Transcranial magnetic stimulation (TMS) can be used to measure cortical excitability, inhibition, and plasticity, which have been implicated in LLD pathophysiology and associated with brain stimulation treatment outcomes in younger adults with depression. TMS measures have not yet been investigated as predictors of treatment outcomes in LLD or pharmacotherapy outcomes in adults of any age with depression. METHODS We assessed whether pretreatment single-pulse and paired-pulse TMS measures, combined with clinical and demographic measures, predict venlafaxine treatment response in 76 outpatients with LLD. We compared the predictive performance of machine learning models including or excluding TMS predictors. RESULTS Two single-pulse TMS measures predicted venlafaxine response: cortical excitability (neuronal membrane excitability) and the variability of cortical excitability (dynamic fluctuations in excitability levels). In cross-validation, models using a combination of these TMS predictors, clinical markers of treatment resistance, and age classified patients with 73% ± 11% balanced accuracy (average correct classification rate of responders and nonresponders; permutation testing, p < .005); these models significantly outperformed (corrected t test, p = .025) models using clinical and demographic predictors alone (60% ± 10% balanced accuracy). CONCLUSIONS These preliminary findings suggest that single-pulse TMS measures of cortical excitability may be useful predictors of response to pharmacotherapy in LLD. Future studies are needed to confirm these findings and determine whether combining TMS predictors with other biomarkers further improves the accuracy of predicting LLD treatment outcome.
Collapse
|
8
|
Sathya M, Jeyaselvi M, Joshi S, Pandey E, Pareek PK, Jamal SS, Kumar V, Atiglah HK. Cancer Categorization Using Genetic Algorithm to Identify Biomarker Genes. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:5821938. [PMID: 35242297 PMCID: PMC8888099 DOI: 10.1155/2022/5821938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/14/2021] [Indexed: 11/18/2022]
Abstract
In the microarray gene expression data, there are a large number of genes that are expressed at varying levels of expression. Given that there are only a few critically significant genes, it is challenging to analyze and categorize datasets that span the whole gene space. In order to aid in the diagnosis of cancer disease and, as a consequence, the suggestion of individualized treatment, the discovery of biomarker genes is essential. Starting with a large pool of candidates, the parallelized minimal redundancy and maximum relevance ensemble (mRMRe) is used to choose the top m informative genes from a huge pool of candidates. A Genetic Algorithm (GA) is used to heuristically compute the ideal set of genes by applying the Mahalanobis Distance (MD) as a distance metric. Once the genes have been identified, they are input into the GA. It is used as a classifier to four microarray datasets using the approved approach (mRMRe-GA), with the Support Vector Machine (SVM) serving as the classification basis. Leave-One-Out-Cross-Validation (LOOCV) is a cross-validation technique for assessing the performance of a classifier. It is now being investigated if the proposed mRMRe-GA strategy can be compared to other approaches. It has been shown that the proposed mRMRe-GA approach enhances classification accuracy while employing less genetic material than previous methods. Microarray, Gene Expression Data, GA, Feature Selection, SVM, and Cancer Classification are some of the terms used in this paper.
Collapse
Affiliation(s)
- M. Sathya
- Department of Information Science and Engineering, AMC Engineering College, Bengaluru, Karnataka 560083, India
| | - M. Jeyaselvi
- Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India
| | - Shubham Joshi
- Department of Computer Engineering, SVKM'S NMIMS MPSTME Shirpur, Maharashtra 425405, India
| | - Ekta Pandey
- Applied Science Department, Bundhelkhand Institute of Engineering and Technology, Jhansi, Uttar Pradesh, India
| | - Piyush Kumar Pareek
- Department of Computer Science & Engineering & Head of IPR Cell, Nitte Meenakshi Institute of Technology, Bengaluru, India
| | - Sajjad Shaukat Jamal
- Department of Mathematics, College of Science, King Khalid University, Abha, Saudi Arabia
| | - Vinay Kumar
- Department of Computer Engineering and Application, GLA University, Mathura, India
| | - Henry Kwame Atiglah
- Department of Electrical and Electronics Engineering, Tamale Technical University, Tamale, Ghana
| |
Collapse
|
9
|
Musa IH, Afolabi LO, Zamit I, Musa TH, Musa HH, Tassang A, Akintunde TY, Li W. Artificial Intelligence and Machine Learning in Cancer Research: A Systematic and Thematic Analysis of the Top 100 Cited Articles Indexed in Scopus Database. Cancer Control 2022; 29:10732748221095946. [PMID: 35688650 PMCID: PMC9189515 DOI: 10.1177/10732748221095946] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
INTRODUCTION Cancer is a major public health problem and a global leading cause of death where the screening, diagnosis, prediction, survival estimation, and treatment of cancer and control measures are still a major challenge. The rise of Artificial Intelligence (AI) and Machine Learning (ML) techniques and their applications in various fields have brought immense value in providing insights into advancement in support of cancer control. METHODS A systematic and thematic analysis was performed on the Scopus database to identify the top 100 cited articles in cancer research. Data were analyzed using RStudio and VOSviewer.Var1.6.6. RESULTS The top 100 articles in AI and ML in cancer received a 33 920 citation score with a range of 108 to 5758 times. Doi Kunio from the USA was the most cited author with total number of citations (TNC = 663). Out of 43 contributed countries, 30% of the top 100 cited articles originated from the USA, and 10% originated from China. Among the 57 peer-reviewed journals, the "Expert Systems with Application" published 8% of the total articles. The results were presented in highlight technological advancement through AI and ML via the widespread use of Artificial Neural Network (ANNs), Deep Learning or machine learning techniques, Mammography-based Model, Convolutional Neural Networks (SC-CNN), and text mining techniques in the prediction, diagnosis, and prevention of various types of cancers towards cancer control. CONCLUSIONS This bibliometric study provides detailed overview of the most cited empirical evidence in AI and ML adoption in cancer research that could efficiently help in designing future research. The innovations guarantee greater speed by using AI and ML in the detection and control of cancer to improve patient experience.
Collapse
Affiliation(s)
- Ibrahim H. Musa
- Department of Software Engineering, School of Computer Science and Engineering, Southeast University, Nanjing, China
- Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing, China
| | - Lukman O. Afolabi
- Guangdong Immune Cell Therapy Engineering and Technology Research Center, Center for Protein and Cell-Based Drugs, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ibrahim Zamit
- University of Chinese Academy of Sciences, Beijing, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Taha H. Musa
- Biomedical Research Institute, Darfur University College, Nyala, South Darfur, Sudan
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, Nanjing, Jiangsu Province, China
| | - Hassan H. Musa
- Faculty of Medical Laboratory Sciences, University of Khartoum, Khartoum, Sudan
| | - Andrew Tassang
- Faculty of Health Sciences, University of Buea, Cameroon
- Buea Regional Hospital, Annex, Cameroon
| | - Tosin Y. Akintunde
- Department of Sociology, School of Public Administration, Hohai University, Nanjing, China
| | - Wei Li
- Department of quality management, Children’s hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
10
|
Gene Selection for Microarray Cancer Classification based on Manta Rays Foraging Optimization and Support Vector Machines. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-06102-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
11
|
Bayrak T, Ogul H. Computer-aided diagnosis of sleep apnea using gene expression. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00557-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Agrawal S, Ransom RF, Saraswathi S, Garcia-Gonzalo E, Webb A, Fernandez-Martinez JL, Popovic M, Guess AJ, Kloczkowski A, Benndorf R, Sadee W, Smoyer WE. Sulfatase 2 Is Associated with Steroid Resistance in Childhood Nephrotic Syndrome. J Clin Med 2021; 10:523. [PMID: 33540508 PMCID: PMC7867139 DOI: 10.3390/jcm10030523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/20/2021] [Accepted: 01/23/2021] [Indexed: 01/17/2023] Open
Abstract
Glucocorticoid (GC) resistance complicates the treatment of ~10-20% of children with nephrotic syndrome (NS), yet the molecular basis for resistance remains unclear. We used RNAseq analysis and in silico algorithm-based approaches on peripheral blood leukocytes from 12 children both at initial NS presentation and after ~7 weeks of GC therapy to identify a 12-gene panel able to differentiate steroid resistant NS (SRNS) from steroid-sensitive NS (SSNS). Among this panel, subsequent validation and analyses of one biologically relevant candidate, sulfatase 2 (SULF2), in up to a total of 66 children, revealed that both SULF2 leukocyte expression and plasma arylsulfatase activity Post/Pre therapy ratios were greater in SSNS vs. SRNS. However, neither plasma SULF2 endosulfatase activity (measured by VEGF binding activity) nor plasma VEGF levels, distinguished SSNS from SRNS, despite VEGF's reported role as a downstream mediator of SULF2's effects in glomeruli. Experimental studies of NS-related injury in both rat glomeruli and cultured podocytes also revealed decreased SULF2 expression, which were partially reversible by GC treatment of podocytes. These findings together suggest that SULF2 levels and activity are associated with GC resistance in NS, and that SULF2 may play a protective role in NS via the modulation of downstream mediators distinct from VEGF.
Collapse
Affiliation(s)
- Shipra Agrawal
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - Richard F. Ransom
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - Saras Saraswathi
- Battelle Center for Mathematical Medicine at Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA;
| | | | - Amy Webb
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | | | - Milan Popovic
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
| | - Adam J. Guess
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
| | - Andrzej Kloczkowski
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
- Battelle Center for Mathematical Medicine at Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA;
| | - Rainer Benndorf
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - Wolfgang Sadee
- Department of Cancer Biology and Genetics, Center for Pharmacogenomics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - William E. Smoyer
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | | |
Collapse
|
13
|
Panda M. Elephant search optimization combined with deep neural network for microarray data analysis. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2020. [DOI: 10.1016/j.jksuci.2017.12.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
14
|
MotieGhader H, Masoudi-Sobhanzadeh Y, Ashtiani SH, Masoudi-Nejad A. mRNA and microRNA selection for breast cancer molecular subtype stratification using meta-heuristic based algorithms. Genomics 2020; 112:3207-3217. [DOI: 10.1016/j.ygeno.2020.06.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 05/13/2020] [Accepted: 06/02/2020] [Indexed: 02/06/2023]
|
15
|
Pozzoli S, Soliman A, Bahri L, Branca RM, Girdzijauskas S, Brambilla M. Domain expertise-agnostic feature selection for the analysis of breast cancer data. Artif Intell Med 2020; 108:101928. [PMID: 32972658 DOI: 10.1016/j.artmed.2020.101928] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 06/04/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Progress in proteomics has enabled biologists to accurately measure the amount of protein in a tumor. This work is based on a breast cancer data set, result of the proteomics analysis of a cohort of tumors carried out at Karolinska Institutet. While evidence suggests that an anomaly in the protein content is related to the cancerous nature of tumors, the proteins that could be markers of cancer types and subtypes and the underlying interactions are not completely known. This work sheds light on the potential of the application of unsupervised learning in the analysis of the aforementioned data sets, namely in the detection of distinctive proteins for the identification of the cancer subtypes, in the absence of domain expertise. In the analyzed data set, the number of samples, or tumors, is significantly lower than the number of features, or proteins; consequently, the input data can be thought of as high-dimensional data. The use of high-dimensional data has already become widespread, and a great deal of effort has been put into high-dimensional data analysis by means of feature selection, but it is still largely based on prior specialist knowledge, which in this case is not complete. There is a growing need for unsupervised feature selection, which raises the issue of how to generate promising subsets of features among all the possible combinations, as well as how to evaluate the quality of these subsets in the absence of specialist knowledge. We hereby propose a new wrapper method for the generation and evaluation of subsets of features via spectral clustering and modularity, respectively. We conduct experiments to test the effectiveness of the new method in the analysis of the breast cancer data, in a domain expertise-agnostic context. Furthermore, we show that we can successfully augment our method by incorporating an external source of data on known protein complexes. Our approach reveals a large number of subsets of features that are better at clustering the samples than the state-of-the-art classification in terms of modularity and shows a potential to be useful for future proteomics research.
Collapse
Affiliation(s)
- Susanna Pozzoli
- KTH Royal Institute of Technology, Stockholm, Sweden; Politecnico di Milano, Milan, Italy.
| | | | - Leila Bahri
- KTH Royal Institute of Technology, Stockholm, Sweden
| | | | | | | |
Collapse
|
16
|
Jalayeri S, Abdolrazzagh-Nezhad M. Chemical reaction optimization to disease diagnosis by optimizing hyper-planes classifiers. Soft comput 2019. [DOI: 10.1007/s00500-019-03869-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
17
|
Brankovic A, Hosseini M, Piroddi L. A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1802-1815. [PMID: 29993889 DOI: 10.1109/tcbb.2018.2833482] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
DNA microarray datasets are characterized by a large number of features with very few samples, which is a typical cause of overfitting and poor generalization in the classification task. Here, we introduce a novel feature selection (FS) approach which employs the distance correlation (dCor) as a criterion for evaluating the dependence of the class on a given feature subset. The dCor index provides a reliable dependence measure among random vectors of arbitrary dimension, without any assumption on their distribution. Moreover, it is sensitive to the presence of redundant terms. The proposed FS method is based on a probabilistic representation of the feature subset model, which is progressively refined by a repeated process of model extraction and evaluation. A key element of the approach is a distributed optimization scheme based on a vertical partitioning of the dataset, which alleviates the negative effects of its unbalanced dimensions. The proposed method has been tested on several microarray datasets, resulting in quite compact and accurate models obtained at a reasonable computational cost.
Collapse
|
18
|
BILEN M, ISIK AH, YIGIT T. The Use of Artificial Neural Networks Optimized with Fire Fly Algorithm in Cancer Diagnosis. GAZI UNIVERSITY JOURNAL OF SCIENCE 2019. [DOI: 10.35378/gujs.471859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
19
|
Silva JCF, Teixeira RM, Silva FF, Brommonschenkel SH, Fontes EPB. Machine learning approaches and their current application in plant molecular biology: A systematic review. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2019; 284:37-47. [PMID: 31084877 DOI: 10.1016/j.plantsci.2019.03.020] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 02/28/2019] [Accepted: 03/26/2019] [Indexed: 05/19/2023]
Abstract
Machine learning (ML) is a field of artificial intelligence that has rapidly emerged in molecular biology, thus allowing the exploitation of Big Data concepts in plant genomics. In this context, the main challenges are given in terms of how to analyze massive datasets and extract new knowledge in all levels of cellular systems research. In summary, ML techniques allow complex interactions to be inferred in several biological systems. Despite its potential, ML has been underused due to complex computational algorithms and definition terms. Therefore, a systematic review to disentangle ML approaches is relevant for plant scientists and has been considered in this study. We presented the main steps for ML development (from data selection to evaluation of classification/prediction models) with a respective discussion approaching functional genomics mainly in terms of pathogen effector genes in plant immunity. Additionally, we also considered how to access public source databases under an ML framework towards advancing plant molecular biology and introduced novel powerful tools, such as deep learning.
Collapse
Affiliation(s)
- Jose Cleydson F Silva
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Ruan M Teixeira
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Fabyano F Silva
- Department of Animal Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Sergio H Brommonschenkel
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Plant Pathology Department /Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Elizabeth P B Fontes
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil.
| |
Collapse
|
20
|
Jansi Rani M, Devaraj D. Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification. J Med Syst 2019; 43:235. [PMID: 31209677 DOI: 10.1007/s10916-019-1372-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 06/05/2019] [Indexed: 01/20/2023]
Abstract
Cancer is a deadly disease which requires a very complex and costly treatment. Microarray data classification plays an important role in cancer treatment. An efficient gene selection technique to select the more promising genes is necessary for cancer classification. Here, we propose a Two-stage MI-GA Gene Selection algorithm for selecting informative genes in cancer data classification. In the first stage, Mutual Information based gene selection is applied which selects only the genes that have high information related to the cancer. The genes which have high mutual information value are given as input to the second stage. The Genetic Algorithm based gene selection is applied in the second stage to identify and select the optimal set of genes required for accurate classification. For classification, Support Vector Machine (SVM) is used. The proposed MI-GA gene selection approach is applied to Colon, Lung and Ovarian cancer datasets and the results show that the proposed gene selection approach results in higher classification accuracy compared to the existing methods.
Collapse
Affiliation(s)
- M Jansi Rani
- School of Computing, Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India.
| | - D Devaraj
- School of Electronics & Electrical Technology, Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India
| |
Collapse
|
21
|
Wu CH, Chen TC, Hsieh YC, Tsao HL. A hybrid rule mining approach for cardiovascular disease detection in traditional Chinese medicine. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-169864] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Chun-Hui Wu
- Department of Information Management, National Formosa University, Huwei Township, Yunlin County, Taiwan
| | - Ta-Cheng Chen
- Department of Information Management, National Formosa University, Huwei Township, Yunlin County, Taiwan
- Department of M-Commerce and Multimedia Applications, Asia University, Wufeng, Taichung, Taiwan
| | - Yi-Chih Hsieh
- Department of Industrial Management, National Formosa University, Huwei Township, Yunlin County, Taiwan
| | - Huei-Ling Tsao
- Department of Information Management, National Formosa University, Huwei Township, Yunlin County, Taiwan
| |
Collapse
|
22
|
Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. J Biomed Inform 2019; 92:103124. [PMID: 30796977 DOI: 10.1016/j.jbi.2019.103124] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 01/15/2019] [Accepted: 02/04/2019] [Indexed: 12/17/2022]
Abstract
Microarray technique is a prevalent method for the classification and prediction of colorectal cancer (CRC). Nevertheless, microarray data suffers from the curse of dimensionality when selecting feature genes of the disease based on imbalance samples, thus causing low prediction accuracy. Hence, it is of vital significance to build proper models that can avoid the above problems and predict the CRC more accurately. In this paper, we use an ensemble model to classify samples into healthy and CRC groups and improve prediction performance. The proposed model is composed of three functional modules. The first module mainly performs the function of removing redundant genes. The main feature genes are selected using minimum redundancy maximum relevance (mRMR) method to reduce the dimensionality of features thereby increasing the prediction results. The second module aims to solve the problem caused by imbalanced data using hybrid sampling algorithm RUSBoost. The third module focuses on the classification algorithm optimization. We use mixed kernel function (MKF) based support vector machine (SVM) model to classify an unknown sample into healthy individuals and CRC patients, and then, the Whale Optimization Algorithm (WOA) is applied to find most optimal parameters of the proposed MKF-SVM. The final results show that the proposed model achieves higher G-means than other comparable models. The conclusion comes to show that RUSBoost wrapping WOA + MKF-SVM model can be applied to improve the predictive performance of colorectal cancer based on the imbalanced data.
Collapse
Affiliation(s)
- Dandan Zhao
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China.
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Yanlin He
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Dianjie Lu
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Chen Lyu
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| |
Collapse
|
23
|
Kurbanoglu S, Bakirhan NK, Gumustas M, Ozkan SA. Modern Assay Techniques for Cancer Drugs: Electroanalytical and Liquid Chromatography Methods. Crit Rev Anal Chem 2019; 49:306-323. [PMID: 30595027 DOI: 10.1080/10408347.2018.1527206] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In the past decades, patients who have chemotherapy treatment have considerably increased number. At this point, the development of rapid precise, and reliable methods are very important to analyze cancer drugs from their dosage forms, animals or human biological samples. Among all the analytical methods, electrochemical methods hold an important position with their unique properties such as specificity in the biological recognition process, fast response, and their reliability and do not need a pretreatment process. Chromatographic methods are also used in a wide range of analytical applications for the analyses of anticancer drugs. The power of chromatography comes from its ability to separate a mixture of analytes and determination of their concentrations. Chromatographic techniques can mainly be divided into gas, liquid, and supercritical fluid chromatography. In the frame of this information, this review is aimed to provide basic principles of electroanalytical and high-performance liquid chromatography methods for the analysis of cancer drugs. In addition, some selected applications for electrochemistry-related techniques and high-performance liquid chromatography, for the determination of anti-cancer pharmaceuticals published in the last five years are also discussed.
Collapse
Affiliation(s)
- Sevinc Kurbanoglu
- a Faculty of Pharmacy, Department of Analytical Chemistry , Ankara University , Ankara , Turkey
| | - Nurgul K Bakirhan
- b Faculty of Science and Art, Department of Chemistry , Hitit University , Çorum , Turkey
| | - Mehmet Gumustas
- c Department of Forensic Toxicology , Ankara University Institute of Forensic Sciences , Ankara , Turkey
| | - Sibel A Ozkan
- a Faculty of Pharmacy, Department of Analytical Chemistry , Ankara University , Ankara , Turkey
| |
Collapse
|
24
|
A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput 2018; 57:901-912. [PMID: 30478811 DOI: 10.1007/s11517-018-1930-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Accepted: 11/17/2018] [Indexed: 02/07/2023]
Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors. Graphical abstract Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.
Collapse
|
25
|
Alshamlan HM. DQB: A novel dynamic quantitive classification model using artificial bee colony algorithm with application on gene expression profiles. Saudi J Biol Sci 2018; 25:932-946. [PMID: 30108444 PMCID: PMC6087852 DOI: 10.1016/j.sjbs.2018.01.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 01/30/2018] [Accepted: 01/31/2018] [Indexed: 02/01/2023] Open
Abstract
In the medical domain, it is very significant to develop a rule-based classification model. This is because it has the ability to produce a comprehensible and understandable model that accounts for the predictions. Moreover, it is desirable to know not only the classification decisions but also what leads to these decisions. In this paper, we propose a novel dynamic quantitative rule-based classification model, namely DQB, which integrates quantitative association rule mining and the Artificial Bee Colony (ABC) algorithm to provide users with more convenience in terms of understandability and interpretability via an accurate class quantitative association rule-based classifier model. As far as we know, this is the first attempt to apply the ABC algorithm in mining for quantitative rule-based classifier models. In addition, this is the first attempt to use quantitative rule-based classification models for classifying microarray gene expression profiles. Also, in this research we developed a new dynamic local search strategy named DLS, which is improved the local search for artificial bee colony (ABC) algorithm. The performance of the proposed model has been compared with well-known quantitative-based classification methods and bio-inspired meta-heuristic classification algorithms, using six gene expression profiles for binary and multi-class cancer datasets. From the results, it can be concludes that a considerable increase in classification accuracy is obtained for the DQB when compared to other available algorithms in the literature, and it is able to provide an interpretable model for biologists. This confirms the significance of the proposed algorithm in the constructing a classifier rule-based model, and accordingly proofs that these rules obtain a highly qualified and meaningful knowledge extracted from the training set, where all subset of quantitive rules report close to 100% classification accuracy with a minimum number of genes. It is remarkable that apparently (to the best of our knowledge) several new genes were discovered that have not been seen in any past studies. For the applicability demand, based on the results acqured from microarray gene expression analysis, we can conclude that DQB can be adopted in a different real world applications with some modifications.
Collapse
Affiliation(s)
- Hala M Alshamlan
- Information Technology Department, King Saud University, Riyadh, Saudi Arabia.,Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
26
|
Alshamlan HM. Co-ABC: Correlation artificial bee colony algorithm for biomarker gene discovery using gene expression profile. Saudi J Biol Sci 2018; 25:895-903. [PMID: 30108438 PMCID: PMC6088113 DOI: 10.1016/j.sjbs.2017.12.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 12/20/2017] [Accepted: 12/26/2017] [Indexed: 11/21/2022] Open
Abstract
In this paper, we propose a new hybrid method based on Correlation-based feature selection method and Artificial Bee Colony algorithm,namely Co-ABC to select a small number of relevant genes for accurate classification of gene expression profile. The Co-ABC consists of three stages which are fully cooperated: The first stage aims to filter noisy and redundant genes in high dimensionality domains by applying Correlation-based feature Selection (CFS) filter method. In the second stage, Artificial Bee Colony (ABC) algorithm is used to select the informative and meaningful genes. In the third stage, we adopt a Support Vector Machine (SVM) algorithm as classifier using the preselected genes form second stage. The overall performance of our proposed Co-ABC algorithm was evaluated using six gene expression profile for binary and multi-class cancer datasets. In addition, in order to proof the efficiency of our proposed Co-ABC algorithm, we compare it with previously known related methods. Two of these methods was re-implemented for the sake of a fair comparison using the same parameters. These two methods are: Co-GA, which is CFS combined with a genetic algorithm GA. The second one named Co-PSO, which is CFS combined with a particle swarm optimization algorithm PSO. The experimental results shows that the proposed Co-ABC algorithm acquire the accurate classification performance using small number of predictive genes. This proofs that Co-ABC is a efficient approach for biomarker gene discovery using cancer gene expression profile.
Collapse
|
27
|
Shahid A, Choi JH, Rana AUHS, Kim HS. Least Squares Neural Network-Based Wireless E-Nose System Using an SnO₂ Sensor Array. SENSORS 2018; 18:s18051446. [PMID: 29734783 PMCID: PMC5982671 DOI: 10.3390/s18051446] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Revised: 05/02/2018] [Accepted: 05/03/2018] [Indexed: 11/17/2022]
Abstract
Over the last few decades, the development of the electronic nose (E-nose) for detection and quantification of dangerous and odorless gases, such as methane (CH4) and carbon monoxide (CO), using an array of SnO2 gas sensors has attracted considerable attention. This paper addresses sensor cross sensitivity by developing a classifier and estimator using an artificial neural network (ANN) and least squares regression (LSR), respectively. Initially, the ANN was implemented using a feedforward pattern recognition algorithm to learn the collective behavior of an array as the signature of a particular gas. In the second phase, the classified gas was quantified by minimizing the mean square error using LSR. The combined approach produced 98.7% recognition probability, with 95.5 and 94.4% estimated gas concentration accuracies for CH4 and CO, respectively. The classifier and estimator parameters were deployed in a remote microcontroller for the actualization of a wireless E-nose system.
Collapse
Affiliation(s)
- Areej Shahid
- Division of Electronics and Electrical Engineering, Dongguk University-Seoul, Seoul 04620, Korea.
| | - Jong-Hyeok Choi
- Division of Electronics and Electrical Engineering, Dongguk University-Seoul, Seoul 04620, Korea.
| | | | - Hyun-Seok Kim
- Division of Electronics and Electrical Engineering, Dongguk University-Seoul, Seoul 04620, Korea.
| |
Collapse
|
28
|
Curcumin in Advancing Treatment for Gynecological Cancers with Developed Drug- and Radiotherapy-Associated Resistance. Rev Physiol Biochem Pharmacol 2018; 176:107-129. [DOI: 10.1007/112_2018_11] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
29
|
Pashaei E, Ozen M, Aydin N. Biomarker discovery based on BBHA and AdaboostM1 on microarray data for cancer classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:3080-3083. [PMID: 28268962 DOI: 10.1109/embc.2016.7591380] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this paper, a new approach based on Binary Black Hole Algorithm (BBHA) and Adaptive Boosting version Ml (AdaboostM1) is proposed for finding genes that can classify the group of cancers correctly. In this approach, BBHA is used to perform gene selection and AdaboostM1 with 10-fold cross validation is adopted as the classifier. Also, to find the relation between the biomarkers for biological point of view, decision tree algorithm (C4.5) is utilized. The proposed approach is tested on three benchmark microarrays. The experimental results show that our proposed method can select the most informative gene subsets by reducing the dimension of the data set and improve classification accuracy as compared to several recent studies.
Collapse
|
30
|
Das A, Das S. Feature weighting and selection with a Pareto-optimal trade-off between relevancy and redundancy. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.01.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
31
|
Motieghader H, Najafi A, Sadeghi B, Masoudi-Nejad A. A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. INFORMATICS IN MEDICINE UNLOCKED 2017. [DOI: 10.1016/j.imu.2017.10.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
32
|
Dwivedi AK. Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2701-1] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
33
|
Sardana M, Agrawal R, Kaur B. A hybrid of clustering and quantum genetic algorithm for relevant genes selection for cancer microarray data. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2016. [DOI: 10.3233/kes-160341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - R.K. Agrawal
- School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Baljeet Kaur
- Hansraj College, University of Delhi, Delhi, India
| |
Collapse
|
34
|
Gabere MN, Hussein MA, Aziz MA. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer. Onco Targets Ther 2016; 9:3313-25. [PMID: 27330311 PMCID: PMC4898422 DOI: 10.2147/ott.s98910] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples.
Collapse
Affiliation(s)
- Musa Nur Gabere
- Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Mohamed Aly Hussein
- Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Mohammad Azhar Aziz
- Colorectal Cancer Research Program, Department of Medical Genomics, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia
| |
Collapse
|
35
|
Alshamlan HM, Badr GH, Alohali YA. ABC-SVM: Artificial Bee Colony and SVM Method for Microarray Gene Selection and Multi Class Cancer Classification. ACTA ACUST UNITED AC 2016. [DOI: 10.18178/ijmlc.2016.6.3.596] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
36
|
Mass spectrometry cancer data classification using wavelets and genetic algorithm. FEBS Lett 2015; 589:3879-86. [DOI: 10.1016/j.febslet.2015.11.019] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 11/12/2015] [Accepted: 11/16/2015] [Indexed: 11/18/2022]
|
37
|
Chen F, Chen P, Li D, Cheng F. Kernel-like impurity detection according to colour band spectral image using GA/SVM. THE IMAGING SCIENCE JOURNAL 2015. [DOI: 10.1179/1743131x15y.0000000022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
38
|
Environmental mold and mycotoxin exposures elicit specific cytokine and chemokine responses. PLoS One 2015; 10:e0126926. [PMID: 26010737 PMCID: PMC4444319 DOI: 10.1371/journal.pone.0126926] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2014] [Accepted: 04/09/2015] [Indexed: 12/22/2022] Open
Abstract
Background Molds can cause respiratory symptoms and asthma. We sought to use isolated peripheral blood mononuclear cells (PBMCs) to understand changes in cytokine and chemokine levels in response to mold and mycotoxin exposures and to link these levels with respiratory symptoms in humans. We did this by utilizing an ex vivo assay approach to differentiate mold-exposed patients and unexposed controls. While circulating plasma chemokine and cytokine levels from these two groups might be similar, we hypothesized that by challenging their isolated white blood cells with mold or mold extracts, we would see a differential chemokine and cytokine release. Methods and Findings Peripheral blood mononuclear cells (PBMCs) were isolated from blood from 33 patients with a history of mold exposures and from 17 controls. Cultured PBMCs were incubated with the most prominent Stachybotrys chartarum mycotoxin, satratoxin G, or with aqueous mold extract, ionomycin, or media, each with or without PMA. Additional PBMCs were exposed to spores of Aspergillus niger, Cladosporium herbarum and Penicillium chrysogenum. After 18 hours, cytokines and chemokines released into the culture medium were measured by multiplex assay. Clinical histories, physical examinations and pulmonary function tests were also conducted. After ex vivo PBMC exposures to molds or mycotoxins, the chemokine and cytokine profiles from patients with a history of mold exposure were significantly different from those of unexposed controls. In contrast, biomarker profiles from cells exposed to media alone showed no difference between the patients and controls. Conclusions These findings demonstrate that chronic mold exposures induced changes in inflammatory and immune system responses to specific mold and mycotoxin challenges. These responses can differentiate mold-exposed patients from unexposed controls. This strategy may be a powerful approach to document immune system responsiveness to molds and other inflammation-inducing environmental agents.
Collapse
|
39
|
mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling. BIOMED RESEARCH INTERNATIONAL 2015; 2015:604910. [PMID: 25961028 PMCID: PMC4414228 DOI: 10.1155/2015/604910] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 03/15/2015] [Accepted: 03/15/2015] [Indexed: 01/02/2023]
Abstract
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.
Collapse
|
40
|
Alshamlan HM, Badr GH, Alohali YA. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Comput Biol Chem 2015; 56:49-60. [PMID: 25880524 DOI: 10.1016/j.compbiolchem.2015.03.001] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Revised: 03/15/2015] [Accepted: 03/15/2015] [Indexed: 01/06/2023]
Abstract
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.
Collapse
Affiliation(s)
- Hala M Alshamlan
- Computer Science Department, King Saud University, Riyadh, Saudi Arabia.
| | - Ghada H Badr
- Computer Science Department, King Saud University, Riyadh, Saudi Arabia; IRI - The City of Scientific Research and Technological Applications, Alexandria, Egypt.
| | - Yousef A Alohali
- Computer Science Department, King Saud University, Riyadh, Saudi Arabia.
| |
Collapse
|
41
|
Yan XB, Xiong WQ, Hu L, Zhao K. Cancer prediction based on radical basis function neural network with particle swarm optimization. Asian Pac J Cancer Prev 2014; 15:7775-80. [PMID: 25292062 DOI: 10.7314/apjcp.2014.15.18.7775] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
This paper addresses cancer prediction based on radial basis function neural network optimized by particle swarm optimization. Today, cancer hazard to people is increasing, and it is often difficult to cure cancer. The occurrence of cancer can be predicted by the method of the computer so that people can take timely and effective measures to prevent the occurrence of cancer. In this paper, the occurrence of cancer is predicted by the means of Radial Basis Function Neural Network Optimized by Particle Swarm Optimization. The neural network parameters to be optimized include the weight vector between network hidden layer and output layer, and the threshold of output layer neurons. The experimental data were obtained from the Wisconsin breast cancer database. A total of 12 experiments were done by setting 12 different sets of experimental result reliability. The findings show that the method can improve the accuracy, reliability and stability of cancer prediction greatly and effectively.
Collapse
Affiliation(s)
- Xiao-Bo Yan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China E-mail :
| | | | | | | |
Collapse
|
42
|
Wang H, Xing F, Su H, Stromberg A, Yang L. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC Bioinformatics 2014; 15:310. [PMID: 25240495 PMCID: PMC4287550 DOI: 10.1186/1471-2105-15-310] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 08/14/2014] [Indexed: 02/05/2023] Open
Abstract
Background Non-small cell lung cancer (NSCLC), the most common type of lung cancer, is one of serious diseases causing death for both men and women. Computer-aided diagnosis and survival prediction of NSCLC, is of great importance in providing assistance to diagnosis and personalize therapy planning for lung cancer patients. Results In this paper we have proposed an integrated framework for NSCLC computer-aided diagnosis and survival analysis using novel image markers. The entire biomedical imaging informatics framework consists of cell detection, segmentation, classification, discovery of image markers, and survival analysis. A robust seed detection-guided cell segmentation algorithm is proposed to accurately segment each individual cell in digital images. Based on cell segmentation results, a set of extensive cellular morphological features are extracted using efficient feature descriptors. Next, eight different classification techniques that can handle high-dimensional data have been evaluated and then compared for computer-aided diagnosis. The results show that the random forest and adaboost offer the best classification performance for NSCLC. Finally, a Cox proportional hazards model is fitted by component-wise likelihood based boosting. Significant image markers have been discovered using the bootstrap analysis and the survival prediction performance of the model is also evaluated. Conclusions The proposed model have been applied to a lung cancer dataset that contains 122 cases with complete clinical information. The classification performance exhibits high correlations between the discovered image markers and the subtypes of NSCLC. The survival analysis demonstrates strong prediction power of the statistical model built from the discovered image markers.
Collapse
Affiliation(s)
| | | | | | | | - Lin Yang
- J, Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, 1275 Center Drive, 32611 Gainesville, FL, USA.
| |
Collapse
|
43
|
Afsari B, Braga-Neto UM, Geman D. Rank discriminants for predicting phenotypes from RNA expression. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas738] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
44
|
Abel L, Kutschki S, Turewicz M, Eisenacher M, Stoutjesdijk J, Meyer HE, Woitalla D, May C. Autoimmune profiling with protein microarrays in clinical applications. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:977-87. [PMID: 24607371 DOI: 10.1016/j.bbapap.2014.02.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 02/18/2014] [Accepted: 02/27/2014] [Indexed: 02/05/2023]
Abstract
In recent years, knowledge about immune-related disorders has substantially increased, especially in the field of central nervous system (CNS) disorders. Recent innovations in protein-related microarray technology have enabled the analysis of interactions between numerous samples and up to 20,000 targets. Antibodies directed against ion channels, receptors and other synaptic proteins have been identified, and their causative roles in different disorders have been identified. Knowledge about immunological disorders is likely to expand further as more antibody targets are discovered. Therefore, protein microarrays may become an established tool for routine diagnostic procedures in the future. The identification of relevant target proteins requires the development of new strategies to handle and process vast quantities of data so that these data can be evaluated and correlated with relevant clinical issues, such as disease progression, clinical manifestations and prognostic factors. This review will mainly focus on new protein array technologies, which allow the processing of a large number of samples, and their various applications with a deeper insight into their potential use as diagnostic tools in neurodegenerative diseases and other diseases. This article is part of a Special Issue entitled: Biomarkers: A Proteomic Challenge.
Collapse
Affiliation(s)
- Laura Abel
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Simone Kutschki
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Martin Eisenacher
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Jale Stoutjesdijk
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Helmut E Meyer
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany; Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany
| | - Dirk Woitalla
- S. Josef Hospital, Ruhr-University Bochum, 44780 Bochum, Germany; St. Josef-Krankenhaus Kupferdreh, Heidbergweg 22-24, 45257 Essen, Germany
| | - Caroline May
- Department of Medical Proteomics/Bioanalytics, Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany.
| |
Collapse
|
45
|
Clustering in Conjunction with Quantum Genetic Algorithm for Relevant Genes Selection for Cancer Microarray Data. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-642-40319-4_37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
46
|
Pahikkala T, Okser S, Airola A, Salakoski T, Aittokallio T. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms Mol Biol 2012; 7:11. [PMID: 22551170 PMCID: PMC3606421 DOI: 10.1186/1748-7188-7-11] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 04/23/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. RESULTS We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. CONCLUSIONS Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.
Collapse
Affiliation(s)
- Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tero Aittokallio
- Turku Centre for Computer Science, Turku, Finland
- Department of Mathematics, University of Turku, Turku, Finland
- Data Mining and Modeling group, Turku Centre for Biotechnology, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
47
|
Chuang LY, Yang CH, Lin MC, Yang CH. CpGPAP: CpG island predictor analysis platform. BMC Genet 2012; 13:13. [PMID: 22385986 PMCID: PMC3313849 DOI: 10.1186/1471-2156-13-13] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2011] [Accepted: 03/02/2012] [Indexed: 12/31/2022] Open
Abstract
Background Genomic islands play an important role in medical, methylation and biological studies. To explore the region, we propose a CpG islands prediction analysis platform for genome sequence exploration (CpGPAP). Results CpGPAP is a web-based application that provides a user-friendly interface for predicting CpG islands in genome sequences or in user input sequences. The prediction algorithms supported in CpGPAP include complementary particle swarm optimization (CPSO), a complementary genetic algorithm (CGA) and other methods (CpGPlot, CpGProD and CpGIS) found in the literature. The CpGPAP platform is easy to use and has three main features (1) selection of the prediction algorithm; (2) graphic visualization of results; and (3) application of related tools and dataset downloads. These features allow the user to easily view CpG island results and download the relevant island data. CpGPAP is freely available at http://bio.kuas.edu.tw/CpGPAP/. Conclusions The platform's supported algorithms (CPSO and CGA) provide a higher sensitivity and a higher correlation coefficient when compared to CpGPlot, CpGProD, CpGIS, and CpGcluster over an entire chromosome.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering, Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | | | | | | |
Collapse
|
48
|
Li X, Peng S, Chen J, Lü B, Zhang H, Lai M. SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles. Biochem Biophys Res Commun 2012; 419:148-53. [PMID: 22306013 DOI: 10.1016/j.bbrc.2012.01.087] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 01/18/2012] [Indexed: 11/16/2022]
Abstract
Although metastasis is the principal cause of death cause for colorectal cancer (CRC) patients, the molecular mechanisms underlying CRC metastasis are still not fully understood. In an attempt to identify metastasis-related genes in CRC, we obtained gene expression profiles of 55 early stage primary CRCs, 56 late stage primary CRCs, and 34 metastatic CRCs from the expression project in Oncology (http://www.intgen.org/expo/). We developed a novel gene selection algorithm (SVM-T-RFE), which extends support vector machine recursive feature elimination (SVM-RFE) algorithm by incorporating T-statistic. We achieved highest classification accuracy (100%) with smaller gene subsets (10 and 6, respectively), when classifying between early and late stage primary CRCs, as well as between metastatic CRCs and late stage primary CRCs. We also compared the performance of SVM-T-RFE and SVM-RFE gene selection algorithms on another large-scale CRC dataset and the five public microarray datasets. SVM-T-RFE bestowed SVM-RFE algorithm in identifying more differentially expressed genes, and achieving highest prediction accuracy using equal or smaller number of selected genes. A fraction of selected genes have been reported to be associated with CRC development or metastasis.
Collapse
Affiliation(s)
- Xiaobo Li
- Department of Pathology, School of Medicine, Zhejiang University, Hangzhou 310058, People's Republic of China.
| | | | | | | | | | | |
Collapse
|
49
|
|
50
|
Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M. ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:452-463. [PMID: 21233525 DOI: 10.1109/tcbb.2010.13] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A combination of Integer-Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural-network-based Extreme Learning Machine (ELM), is used for gene selection and cancer classification. ICGA is used with PSO-ELM to select an optimal set of genes, which is then used to build a classifier to develop an algorithm (ICGA_PSO_ELM) that can handle sparse data and sample imbalance. We evaluate the performance of ICGA-PSO-ELM and compare our results with existing methods in the literature. An investigation into the functions of the selected genes, using a systems biology approach, revealed that many of the identified genes are involved in cell signaling and proliferation. An analysis of these gene sets shows a larger representation of genes that encode secreted proteins than found in randomly selected gene sets. Secreted proteins constitute a major means by which cells interact with their surroundings. Mounting biological evidence has identified the tumor microenvironment as a critical factor that determines tumor survival and growth. Thus, the genes identified by this study that encode secreted proteins might provide important insights to the nature of the critical biological features in the microenvironment of each tumor type that allow these cells to thrive and proliferate.
Collapse
|