1
|
Hiskens MI, Mengistu TS, Li KM, Fenning AS. Systematic Review of the Diagnostic and Clinical Utility of Salivary microRNAs in Traumatic Brain Injury (TBI). Int J Mol Sci 2022; 23:13160. [PMID: 36361944 PMCID: PMC9654991 DOI: 10.3390/ijms232113160] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/18/2022] [Accepted: 10/27/2022] [Indexed: 07/29/2023] Open
Abstract
Research in traumatic brain injury (TBI) is an urgent priority, as there are currently no TBI biomarkers to assess the severity of injury, to predict outcomes, and to monitor recovery. Small non-coding RNAs (sncRNAs) including microRNAs can be measured in saliva following TBI and have been investigated as potential diagnostic markers. The aim of this systematic review was to investigate the diagnostic or prognostic ability of microRNAs extracted from saliva in human subjects. PubMed, Embase, Scopus, PsycINFO and Web of Science were searched for studies that examined the association of saliva microRNAs in TBI. Original studies of any design involving diagnostic capacity of salivary microRNAs for TBI were selected for data extraction. Nine studies met inclusion criteria, with a heterogeneous population involving athletes and hospital patients, children and adults. The studies identified a total of 188 differentially expressed microRNAs, with 30 detected in multiple studies. MicroRNAs in multiple studies involved expression change bidirectionality. The study design and methods involved significant heterogeneity that precluded meta-analysis. Early data indicates salivary microRNAs may assist with TBI diagnosis. Further research with consistent methods and larger patient populations is required to evaluate the diagnostic and prognostic potential of saliva microRNAs.
Collapse
Affiliation(s)
- Matthew I. Hiskens
- Mackay Institute of Research and Innovation, Mackay Hospital and Health Service, 475 Bridge Road, Mackay, QLD 4740, Australia
- School of Health, Medical and Applied Sciences, Central Queensland University, Bruce Highway, Rockhampton, QLD 4702, Australia
| | - Tesfaye S. Mengistu
- Mackay Institute of Research and Innovation, Mackay Hospital and Health Service, 475 Bridge Road, Mackay, QLD 4740, Australia
- Faculty of Medicine, School of Public Health, University of Queensland, 266 Herston Road, Herston, QLD 4006, Australia
| | - Katy M. Li
- School of Health, Medical and Applied Sciences, Central Queensland University, Bruce Highway, Rockhampton, QLD 4702, Australia
| | - Andrew S. Fenning
- School of Health, Medical and Applied Sciences, Central Queensland University, Bruce Highway, Rockhampton, QLD 4702, Australia
| |
Collapse
|
2
|
Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min 2018; 11:22. [PMID: 30386434 PMCID: PMC6203208 DOI: 10.1186/s13040-018-0184-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 10/11/2018] [Indexed: 11/26/2022] Open
Abstract
Background In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer. Methods We retrieve DNA methylation and RNA sequencing datasets from The Cancer Genome Atlas (TCGA), focusing on the Breast Invasive Carcinoma (BRCA), the Thyroid Carcinoma (THCA), and the Kidney Renal Papillary Cell Carcinoma (KIRP). We combine the RNA sequencing gene expression values with the gene methylation quantity, as a new measure that we define for representing the methylation quantity associated to a gene. Additionally, we propose to analyze the combined data through tree- and rule-based classification algorithms (C4.5, Random Forest, RIPPER, and CAMUR). Results We extract more than 15,000 classification models (composed of gene sets), which allow to distinguish the tumoral samples from the normal ones with an average accuracy of 95%. From the integrated experiments we obtain about 5000 classification models that consider both the gene measures related to the RNA sequencing and the DNA methylation experiments. Conclusions We compare the sets of genes obtained from the classifications on RNA sequencing and DNA methylation data with the genes obtained from the integration of the two experiments. The comparison results in several genes that are in common among the single experiments and the integrated ones (733 for BRCA, 35 for KIRP, and 861 for THCA) and 509 genes that are in common among the different experiments. Finally, we investigate the possible relationships among the different analyzed tumors by extracting a core set of 13 genes that appear in all tumors. A preliminary functional analysis confirms the relation of part of those genes (5 out of 13 and 279 out of 509) with cancer, suggesting to focus further studies on the new individuated ones. Electronic supplementary material The online version of this article (10.1186/s13040-018-0184-6) contains supplementary material, which is available to authorized users.
Collapse
|
3
|
Kok MGM, de Ronde MWJ, Moerland PD, Ruijter JM, Creemers EE, Pinto-Sietsma SJ. Small sample sizes in high-throughput miRNA screens: A common pitfall for the identification of miRNA biomarkers. BIOMOLECULAR DETECTION AND QUANTIFICATION 2017; 15:1-5. [PMID: 29276692 PMCID: PMC5737945 DOI: 10.1016/j.bdq.2017.11.002] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 11/02/2017] [Accepted: 11/27/2017] [Indexed: 02/08/2023]
Abstract
Since the discovery of microRNAs (miRNAs), circulating miRNAs have been proposed as biomarkers for disease. Consequently, many groups have tried to identify circulating miRNA biomarkers for various types of diseases including cardiovascular disease and cancer. However, the replicability of these experiments has been disappointingly low. In order to identify circulating miRNA candidate biomarkers, in general, first an unbiased high-throughput screen is performed in which a large number of miRNAs is detected and quantified in the circulation. Because these are costly experiments, many of such studies have been performed using a low number of study subjects (small sample size). Due to lack of power in small sample size experiments, true effects are often missed and many of the detected effects are wrong. Therefore, it is important to have a good estimate of the appropriate sample size for a miRNA high-throughput screen. In this review, we discuss the effects of small sample sizes in high-throughput screens for circulating miRNAs. Using data from a miRNA high-throughput experiment on isolated monocytes, we illustrate that the implementation of power calculations in a high-throughput miRNA discovery experiment will avoid unnecessarily large and expensive experiments, while still having enough power to be able to detect clinically important differences.
Collapse
Affiliation(s)
- M G M Kok
- Departments of Vascular Medicine, University of Amsterdam, Amsterdam, The Netherlands
| | - M W J de Ronde
- Departments of Vascular Medicine, University of Amsterdam, Amsterdam, The Netherlands.,Departments of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands
| | - P D Moerland
- Departments of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands
| | - J M Ruijter
- Departments of Anatomy, Embryology and Physiology, University of Amsterdam, Amsterdam, The Netherlands
| | - E E Creemers
- Departments of Experimental Cardiology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - S J Pinto-Sietsma
- Departments of Vascular Medicine, University of Amsterdam, Amsterdam, The Netherlands.,Departments of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
4
|
Common Subcluster Mining in Microarray Data for Molecular Biomarker Discovery. Interdiscip Sci 2017; 11:348-359. [PMID: 29022249 DOI: 10.1007/s12539-017-0262-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 08/07/2017] [Accepted: 09/12/2017] [Indexed: 12/11/2022]
Abstract
Molecular biomarkers can be potential facilitators for detection of cancer at early stage which is otherwise difficult through conventional biomarkers. Gene expression data from microarray experiments on both normal and diseased cell samples provide enormous scope to explore genetic relations of disease using computational techniques. Varied patterns of expressions of thousands of genes at different cell conditions along with inherent experimental error make the task of isolating disease related genes challenging. In this paper, we present a data mining method, common subcluster mining (CSM), to discover highly perturbed genes under diseased condition from differential expression patterns. The method builds heap through superposing near centroid clusters from gene expression data of normal samples and extracts its core part. It, thus, isolates genes exhibiting the most stable state across normal samples and constitute a reference set for each centroid. It performs the same operation on datasets from corresponding diseased samples and isolates the genes showing drastic changes in their expression patterns. The method thus finds the disease-sensitive genesets when applied to datasets of lung cancer, prostrate cancer, pancreatic cancer, breast cancer, leukemia and pulmonary arterial hypertension. In majority of the cases, few new genes are found over and above some previously reported ones. Genes with distinct deviations in diseased samples are prospective candidates for molecular biomarkers of the respective disease.
Collapse
|
5
|
Borup R, Thuesen LL, Andersen CY, Nyboe-Andersen A, Ziebe S, Winther O, Grøndahl ML. Competence Classification of Cumulus and Granulosa Cell Transcriptome in Embryos Matched by Morphology and Female Age. PLoS One 2016; 11:e0153562. [PMID: 27128483 PMCID: PMC4851390 DOI: 10.1371/journal.pone.0153562] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 03/31/2016] [Indexed: 12/23/2022] Open
Abstract
Objective By focussing on differences in the mural granulosa cell (MGC) and cumulus cell (CC) transcriptomes from follicles resulting in competent (live birth) and non-competent (no pregnancy) oocytes the study aims on defining a competence classifier expression profile in the two cellular compartments. Design: A case-control study. Setting: University based facilities for clinical services and research. Patients: MGC and CC samples from 60 women undergoing IVF treatment following the long GnRH-agonist protocol were collected. Samples from 16 oocytes where live birth was achieved and 16 age- and embryo morphology matched incompetent oocytes were included in the study. Methods MGC and CC were isolated immediately after oocyte retrieval. From the 16 competent and non-competent follicles, mRNA was extracted and expression profile generated on the Human Gene 1.0 ST Affymetrix array. Live birth prediction analysis using machine learning algorithms (support vector machines) with performance estimation by leave-one-out cross validation and independent validation on an external data set. Results We defined a signature of 30 genes expressed in CC predictive of live birth. This live birth prediction model had an accuracy of 81%, a sensitivity of 0.83, a specificity of 0.80, a positive predictive value of 0.77, and a negative predictive value of 0.86. Receiver operating characteristic analysis found an area under the curve of 0.86, significantly greater than random chance. When applied on 3 external data sets with the end-point outcome measure of blastocyst formation, the signature resulted in 62%, 75% and 88% accuracy, respectively. The genes in the classifier are primarily connected to apoptosis and involvement in formation of extracellular matrix. We were not able to define a robust MGC classifier signature that could classify live birth with accuracy above random chance level. Conclusion We have developed a cumulus cell classifier, which showed a promising performance on external data. This suggests that the gene signature at least partly include genes that relates to competence in the developing blastocyst.
Collapse
Affiliation(s)
- Rehannah Borup
- Center for Genomic Medicine, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
- * E-mail:
| | - Lea Langhoff Thuesen
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Claus Yding Andersen
- Laboratory of Reproductive Biology, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Anders Nyboe-Andersen
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Søren Ziebe
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Ole Winther
- Bioinformatics Center, Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, Denmark
| | - Marie Louise Grøndahl
- Fertility Clinic, University Hospital of Copenhagen, Herlev Hospital, Copenhagen, Denmark
| |
Collapse
|
6
|
Lin E, Tsai SJ. Genome-wide microarray analysis of gene expression profiling in major depression and antidepressant therapy. Prog Neuropsychopharmacol Biol Psychiatry 2016; 64:334-40. [PMID: 25708651 DOI: 10.1016/j.pnpbp.2015.02.008] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/13/2015] [Accepted: 02/15/2015] [Indexed: 12/21/2022]
Abstract
Major depressive disorder (MDD) is a serious health concern worldwide. Currently there are no predictive tests for the effectiveness of any particular antidepressant in an individual patient. Thus, doctors must prescribe antidepressants based on educated guesses. With the recent advent of scientific research, genome-wide gene expression microarray studies are widely utilized to analyze hundreds of thousands of biomarkers by high-throughput technologies. In addition to the candidate-gene approach, the genome-wide approach has recently been employed to investigate the determinants of MDD as well as antidepressant response to therapy. In this review, we mainly focused on gene expression studies with genome-wide approaches using RNA derived from peripheral blood cells. Furthermore, we reviewed their limitations and future directions with respect to the genome-wide gene expression profiling in MDD pathogenesis as well as in antidepressant therapy.
Collapse
Affiliation(s)
- Eugene Lin
- Institute of Clinical Medical Science, China Medical University, Taichung, Taiwan; Vita Genomics, Inc., Taipei, Taiwan
| | - Shih-Jen Tsai
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan; Division of Psychiatry, National Yang-Ming University, Taipei, Taiwan.
| |
Collapse
|
7
|
Madahian B, Deng LY, Homayouni R. Development of a literature informed Bayesian machine learning method for feature extraction and classification. BMC Bioinformatics 2015. [PMCID: PMC4625107 DOI: 10.1186/1471-2105-16-s15-p9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
8
|
Madahian B, Roy S, Bowman D, Deng LY, Homayouni R. A Bayesian approach for inducing sparsity in generalized linear models with multi-category response. BMC Bioinformatics 2015; 16 Suppl 13:S13. [PMID: 26423345 PMCID: PMC4597416 DOI: 10.1186/1471-2105-16-s13-s13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes. RESULTS A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods. CONCLUSIONS Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets.
Collapse
|
9
|
Rosenbaum JT, Choi D, Wilson DJ, Grossniklaus HE, Harrington CA, Sibley CH, Dailey RA, Ng JD, Steele EA, Czyz CN, Foster JA, Tse D, Alabiad C, Dubovy S, Parekh PK, Harris GJ, Kazim M, Patel PJ, White VA, Dolman PJ, Korn BS, Kikkawa DO, Edward DP, Alkatan HM, al-Hussain H, Yeatts RP, Selva D, Stauffer P, Planck SR. Orbital pseudotumor can be a localized form of granulomatosis with polyangiitis as revealed by gene expression profiling. Exp Mol Pathol 2015; 99:271-8. [PMID: 26163757 PMCID: PMC4591186 DOI: 10.1016/j.yexmp.2015.07.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 07/02/2015] [Indexed: 01/05/2023]
Abstract
Biopsies and ANCA testing for limited forms of granulomatosis with polyangiitis (GPA) are frequently non-diagnostic. We characterized gene expression in GPA and other causes of orbital inflammation. We tested the hypothesis that a sub-set of patients with non-specific orbital inflammation (NSOI, also known as pseudotumor) mimics a limited form of GPA. Formalin-fixed, paraffin-embedded orbital biopsies were obtained from controls (n=20) and patients with GPA (n=6), NSOI (n=25), sarcoidosis (n=7), or thyroid eye disease (TED) (n=20) and were divided into discovery and validation sets. Transcripts in the tissues were quantified using Affymetrix U133 Plus 2.0 microarrays. Distinct gene expression profiles for controls and subjects with GPA, TED, or sarcoidosis were evident by principal coordinate analyses. Compared with healthy controls, 285 probe sets had elevated signals in subjects with GPA and 1472 were decreased (>1.5-fold difference, false discovery rate adjusted p<0.05). The immunoglobulin family of genes had the most dramatic increase in expression. Although gene expression in GPA could be readily distinguished from gene expression in TED, sarcoidosis, or controls, a comparison of gene expression in GPA versus NSOI found no statistically significant differences. Thus, forms of orbital inflammation can be distinguished based on gene expression. NSOI/pseudotumor is heterogeneous but often may be an unrecognized, localized form of GPA.
Collapse
Affiliation(s)
- James T Rosenbaum
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA; Department of Medicine, Oregon Health & Science University, Portland, OR 97239, USA; Devers Eye Institute, Legacy Health Systems, Portland, OR 97210, USA.
| | - Dongseok Choi
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA; Department of Medicine, Oregon Health & Science University, Portland, OR 97239, USA; Department of Public Health and Preventive Medicine, Oregon Health & Science University, Portland, OR 97239, USA.
| | - David J Wilson
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA.
| | | | - Christina A Harrington
- Integrated Genomics Laboratory, Oregon Health & Science University, Portland, OR 97239, USA.
| | - Cailin H Sibley
- Department of Medicine, Oregon Health & Science University, Portland, OR 97239, USA.
| | - Roger A Dailey
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA.
| | - John D Ng
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA.
| | - Eric A Steele
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA.
| | - Craig N Czyz
- Division of Ophthalmology, Ohio University, Columbus, OH 43228, USA.
| | - Jill A Foster
- Department of Ophthalmology, The Ohio State University, Columbus, OH 43215, USA.
| | - David Tse
- Department of Ophthalmology, University of Miami, FL 33101, USA.
| | - Chris Alabiad
- Department of Ophthalmology, University of Miami, FL 33101, USA.
| | - Sander Dubovy
- Department of Ophthalmology, University of Miami, FL 33101, USA.
| | | | - Gerald J Harris
- Department of Ophthalmology, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| | - Michael Kazim
- Department of Ophthalmology, Columbia University, New York, NY 10032, USA.
| | - Payal J Patel
- Department of Ophthalmology, Columbia University, New York, NY 10032, USA.
| | - Valerie A White
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, British Columbia V5Z 3N9, Canada.
| | - Peter J Dolman
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, British Columbia V5Z 3N9, Canada.
| | - Bobby S Korn
- Department of Ophthalmology, University of California, San Diego, CA 92037, USA.
| | - Don O Kikkawa
- Department of Ophthalmology, University of California, San Diego, CA 92037, USA.
| | - Deepak P Edward
- Research Department, King Khaled Eye Specialist Hospital, Riyadh 11462, Saudi Arabia.
| | - Hind M Alkatan
- Research Department, King Khaled Eye Specialist Hospital, Riyadh 11462, Saudi Arabia.
| | - Hailah al-Hussain
- Research Department, King Khaled Eye Specialist Hospital, Riyadh 11462, Saudi Arabia.
| | - R Patrick Yeatts
- Department of Ophthalmology, Wake Forrest University, Winston-Salem, NC 27103, USA.
| | - Dinesh Selva
- Ophthalmology Network, Royal Adelaide Hospital, Adelaide 5000, Australia.
| | - Patrick Stauffer
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA.
| | - Stephen R Planck
- Casey Eye Institute, Oregon Health & Science University, Portland, OR 97239, USA; Department of Medicine, Oregon Health & Science University, Portland, OR 97239, USA; Devers Eye Institute, Legacy Health Systems, Portland, OR 97210, USA.
| |
Collapse
|
10
|
Factors affecting the accuracy of a class prediction model in gene expression data. BMC Bioinformatics 2015; 16:199. [PMID: 26093633 PMCID: PMC4475623 DOI: 10.1186/s12859-015-0610-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 04/30/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Class prediction models have been shown to have varying performances in clinical gene expression datasets. Previous evaluation studies, mostly done in the field of cancer, showed that the accuracy of class prediction models differs from dataset to dataset and depends on the type of classification function. While a substantial amount of information is known about the characteristics of classification functions, little has been done to determine which characteristics of gene expression data have impact on the performance of a classifier. This study aims to empirically identify data characteristics that affect the predictive accuracy of classification models, outside of the field of cancer. RESULTS Datasets from twenty five studies meeting predefined inclusion and exclusion criteria were downloaded. Nine classification functions were chosen, falling within the categories: discriminant analyses or Bayes classifiers, tree based, regularization and shrinkage and nearest neighbors methods. Consequently, nine class prediction models were built for each dataset using the same procedure and their performances were evaluated by calculating their accuracies. The characteristics of each experiment were recorded, (i.e., observed disease, medical question, tissue/cell types and sample size) together with characteristics of the gene expression data, namely the number of differentially expressed genes, the fold changes and the within-class correlations. Their effects on the accuracy of a class prediction model were statistically assessed by random effects logistic regression. The number of differentially expressed genes and the average fold change had significant impact on the accuracy of a classification model and gave individual explained-variation in prediction accuracy of up to 72% and 57%, respectively. Multivariable random effects logistic regression with forward selection yielded the two aforementioned study factors and the within class correlation as factors affecting the accuracy of classification functions, explaining 91.5% of the between study variation. CONCLUSIONS We evaluated study- and data-related factors that might explain the varying performances of classification functions in non-cancerous datasets. Our results showed that the number of differentially expressed genes, the fold change, and the correlation in gene expression data significantly affect the accuracy of class prediction models.
Collapse
|
11
|
|