Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wall ME, Dyck PA, Brettin TS. SVDMAN--singular value decomposition analysis of microarray data. Bioinformatics 2001;17:566-8. [PMID: 11395437 DOI: 10.1093/bioinformatics/17.6.566] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

For:	Wall ME, Dyck PA, Brettin TS. SVDMAN--singular value decomposition analysis of microarray data. Bioinformatics 2001;17:566-8. [PMID: 11395437 DOI: 10.1093/bioinformatics/17.6.566] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Number

Cited by Other Article(s)

Abdelwahab MM, Al-Karawi KA, Semary HE. Deep Learning-Based Prediction of Alzheimer's Disease Using Microarray Gene Expression Data. Biomedicines 2023;11:3304. [PMID: 38137524 PMCID: PMC10741889 DOI: 10.3390/biomedicines11123304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023] Open

Abstract

Alzheimer's disease is a genetically complex disorder, and microarray technology provides valuable insights into it. However, the high dimensionality of microarray datasets and small sample sizes pose challenges. Gene selection techniques have emerged as a promising solution to this challenge, potentially revolutionizing AD diagnosis. The study aims to investigate deep learning techniques, specifically neural networks, in predicting Alzheimer's disease using microarray gene expression data. The goal is to develop a reliable predictive model for early detection and diagnosis, potentially improving patient care and intervention strategies. This study employed gene selection techniques, including Singular Value Decomposition (SVD) and Principal Component Analysis (PCA), to pinpoint pertinent genes within microarray datasets. Leveraging deep learning principles, we harnessed a Convolutional Neural Network (CNN) as our classifier for Alzheimer's disease (AD) prediction. Our approach involved the utilization of a seven-layer CNN with diverse configurations to process the dataset. Empirical outcomes on the AD dataset underscored the effectiveness of the PCA-CNN model, yielding an accuracy of 96.60% and a loss of 0.3503. Likewise, the SVD-CNN model showcased remarkable accuracy, attaining 97.08% and a loss of 0.2466. These results accentuate the potential of our method for gene dimension reduction and classification accuracy enhancement by selecting a subset of pertinent genes. Integrating gene selection methodologies with deep learning architectures presents a promising framework for elevating AD prediction and promoting precision medicine in neurodegenerative disorders. Ongoing research endeavors aim to generalize this approach for diverse applications, explore alternative gene selection techniques, and investigate a variety of deep learning architectures.

Collapse

Mitrović K, Petrušić I, Radojičić A, Daković M, Savić A. Migraine with aura detection and subtype classification using machine learning algorithms and morphometric magnetic resonance imaging data. Front Neurol 2023;14:1106612. [PMID: 37441607 PMCID: PMC10333052 DOI: 10.3389/fneur.2023.1106612] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 05/22/2023] [Indexed: 07/15/2023] Open

Abstract

Introduction

Migraine with aura (MwA) is a neurological condition manifested in moderate to severe headaches associated with transient visual and somatosensory symptoms, as well as higher cortical dysfunctions. Considering that about 5% of the world's population suffers from this condition and manifestation could be abundant and characterized by various symptoms, it is of great importance to focus on finding new and advanced techniques for the detection of different phenotypes, which in turn, can allow better diagnosis, classification, and biomarker validation, resulting in tailored treatments of MwA patients.

Methods

This research aimed to test different machine learning techniques to distinguish healthy people from those suffering from MwA, as well as people with simple MwA and those experiencing complex MwA. Magnetic resonance imaging (MRI) post-processed data (cortical thickness, cortical surface area, cortical volume, cortical mean Gaussian curvature, and cortical folding index) was collected from 78 subjects [46 MwA patients (22 simple MwA and 24 complex MwA) and 32 healthy controls] with 340 different features used for the algorithm training.

Results

The results show that an algorithm based on post-processed MRI data yields a high classification accuracy (97%) of MwA patients and precise distinction between simple MwA and complex MwA with an accuracy of 98%. Additionally, the sets of features relevant to the classification were identified. The feature importance ranking indicates the thickness of the left temporal pole, right lingual gyrus, and left pars opercularis as the most prominent markers for MwA classification, while the thickness of left pericalcarine gyrus and left pars opercularis are proposed as the two most important features for the simple and complex MwA classification.

Discussion

This method shows significant potential in the validation of MwA diagnosis and subtype classification, which can tackle and challenge the current treatments of MwA.

Collapse

Characterizing Spatiotemporal Transcriptome of the Human Brain Via Low-Rank Tensor Decomposition. STATISTICS IN BIOSCIENCES 2022. [DOI: 10.1007/s12561-021-09331-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Cortez AJ, Kujawa KA, Wilk AM, Sojka DR, Syrkis JP, Olbryt M, Lisowska KM. Evaluation of the Role of ITGBL1 in Ovarian Cancer. Cancers (Basel) 2020;12:E2676. [PMID: 32961775 PMCID: PMC7563769 DOI: 10.3390/cancers12092676] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/15/2020] [Accepted: 09/16/2020] [Indexed: 12/27/2022] Open

Wang YY, Cui C, Qi L, Yan H, Zhao XM. DrPOCS: Drug Repositioning Based on Projection Onto Convex Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:154-162. [PMID: 29993698 DOI: 10.1109/tcbb.2018.2830384] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Girdhar K, Gruebele M, Chemla YR. The Behavioral Space of Zebrafish Locomotion and Its Neural Network Analog. PLoS One 2015;10:e0128668. [PMID: 26132396 PMCID: PMC4489106 DOI: 10.1371/journal.pone.0128668] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 04/30/2015] [Indexed: 11/18/2022] Open

Deng WY, Zheng QH, Wang ZM. Projection vector machine. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2013.04.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Baralis E, Cerquitelli T, Chiusano S, D'elia V, Molinari R, Susta D. Early prediction of the highest workload in incremental cardiopulmonary tests. ACM T INTEL SYST TEC 2013. [DOI: 10.1145/2508037.2508051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Shabalin AA, Nobel AB. Reconstruction of a low-rank matrix in the presence of Gaussian noise. J MULTIVARIATE ANAL 2013. [DOI: 10.1016/j.jmva.2013.03.005] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Ramsköld D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 2013;30:777-82. [PMID: 22820318 PMCID: PMC3467340 DOI: 10.1038/nbt.2282] [Citation(s) in RCA: 1075] [Impact Index Per Article: 97.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2011] [Accepted: 05/22/2012] [Indexed: 12/17/2022]

miRNA-mRNA correlation-network modules in human prostate cancer and the differences between primary and metastatic tumor subtypes. PLoS One 2012;7:e40130. [PMID: 22768240 PMCID: PMC3387006 DOI: 10.1371/journal.pone.0040130] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 06/01/2012] [Indexed: 11/19/2022] Open

Abstract

Recent studies have shown the contribution of miRNAs to cancer pathogenesis. Prostate cancer is the most commonly diagnosed cancer in men. Unlike other major types of cancer, no single gene has been identified as being mutated in the majority of prostate tumors. This implies that the expression profiling of genes, including the non-coding miRNAs, may substantially vary across individual cases of this cancer. The within-class variability makes it possible to reconstruct or infer disease-specific miRNA-mRNA correlation and regulatory modular networks using high-dimensional microarray data of prostate tumor samples. Furthermore, since miRNAs and tumor suppressor genes are usually tissue specific, miRNA-mRNA modules could potentially differ between primary prostate cancer (PPC) and metastatic prostate cancer (MPC). We herein performed an in silico analysis to explore the miRNA-mRNA correlation network modules in the two tumor subtypes. Our analysis identified 5 miRNA-mRNA module pairs (MPs) for PPC and MPC, respectively. Each MP includes one positive-connection (correlation) module and one negative-connection (correlation) module. The number of miRNAs or mRNAs (genes) in each module varies from 2 to 8 or from 6 to 622. The modules discovered for PPC are more informative than those for MPC in terms of the implicated biological insights. In particular, one negative-connection module in PPC fits well with the popularly recognized miRNA-mediated post-transcriptional regulation theory. That is, the 3′UTR sequences of the involved mRNAs (∼620) are enriched with the target site motifs of the 7 modular miRNAs, has-miR-106b, -191, -19b, -92a, -92b, -93, and -141. About 330 GO terms and KEGG pathways, including TGF-beta signaling pathway that maintains tissue homeostasis and plays a crucial role in the suppression of the proliferation of cancer cells, are over-represented (adj.p<0.05) in the modular gene list. These computationally identified modules provide remarkable biological evidence for the interference of miRNAs in the development of prostate cancers and warrant additional follow-up in independent laboratory studies.

Collapse

Zhuang J, Widschwendter M, Teschendorff AE. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics 2012;13:59. [PMID: 22524302 PMCID: PMC3364843 DOI: 10.1186/1471-2105-13-59] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 04/24/2012] [Indexed: 02/07/2023] Open

Abstract

Background

The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technology that allows the methylation state of over 27,000 CpGs to be assayed. While feature selection and classification methods have been comprehensively explored in the context of gene expression data, relatively little is known as to how best to perform feature selection or classification in the context of Illumina Infinium methylation data. Given the rising importance of epigenomics in cancer and other complex genetic diseases, and in view of the upcoming epigenome wide association studies, it is critical to identify the statistical methods that offer improved inference in this novel context.

Results

Using a total of 7 large Illumina Infinium 27k Methylation data sets, encompassing over 1,000 samples from a wide range of tissues, we here provide an evaluation of popular feature selection, dimensional reduction and classification methods on DNA methylation data. Specifically, we evaluate the effects of variance filtering, supervised principal components (SPCA) and the choice of DNA methylation quantification measure on downstream statistical inference. We show that for relatively large sample sizes feature selection using test statistics is similar for M and β-values, but that in the limit of small sample sizes, M-values allow more reliable identification of true positives. We also show that the effect of variance filtering on feature selection is study-specific and dependent on the phenotype of interest and tissue type profiled. Specifically, we find that variance filtering improves the detection of true positives in studies with large effect sizes, but that it may lead to worse performance in studies with smaller yet significant effect sizes. In contrast, supervised principal components improves the statistical power, especially in studies with small effect sizes. We also demonstrate that classification using the Elastic Net and Support Vector Machine (SVM) clearly outperforms competing methods like LASSO and SPCA. Finally, in unsupervised modelling of cancer diagnosis, we find that non-negative matrix factorisation (NMF) clearly outperforms principal components analysis.

Conclusions

Our results highlight the importance of tailoring the feature selection and classification methodology to the sample size and biological context of the DNA methylation study. The Elastic Net emerges as a powerful classification algorithm for large-scale DNA methylation studies, while NMF does well in the unsupervised context. The insights presented here will be useful to any study embarking on large-scale DNA methylation profiling using Illumina Infinium beadarrays.

Collapse

Shukla S, Kavak E, Gregory M, Imashimizu M, Shutinoski B, Kashlev M, Oberdoerffer P, Sandberg R, Oberdoerffer S. CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature 2012;479:74-9. [PMID: 21964334 DOI: 10.1038/nature10442] [Citation(s) in RCA: 718] [Impact Index Per Article: 59.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2011] [Revised: 11/03/2011] [Accepted: 08/12/2011] [Indexed: 12/17/2022]

Construction of protein interaction networks based on the label-free quantitative proteomics. Methods Mol Biol 2011;781:71-85. [PMID: 21877278 DOI: 10.1007/978-1-61779-276-2_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Sone H, Akanuma H, Fukuda T. Oxygenomics in environmental stress. Redox Rep 2010;15:98-114. [PMID: 20594413 DOI: 10.1179/174329210x12650506623843] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

svdPPCS: an effective singular value decomposition-based method for conserved and divergent co-expression gene module identification. BMC Bioinformatics 2010;11:338. [PMID: 20565989 PMCID: PMC2905369 DOI: 10.1186/1471-2105-11-338] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 06/22/2010] [Indexed: 12/25/2022] Open

Abstract

Background

Comparative analysis of gene expression profiling of multiple biological categories, such as different species of organisms or different kinds of tissue, promises to enhance the fundamental understanding of the universality as well as the specialization of mechanisms and related biological themes. Grouping genes with a similar expression pattern or exhibiting co-expression together is a starting point in understanding and analyzing gene expression data. In recent literature, gene module level analysis is advocated in order to understand biological network design and system behaviors in disease and life processes; however, practical difficulties often lie in the implementation of existing methods.

Results

Using the singular value decomposition (SVD) technique, we developed a new computational tool, named svdPPCS (SVD-based Pattern Pairing and Chart Splitting), to identify conserved and divergent co-expression modules of two sets of microarray experiments. In the proposed methods, gene modules are identified by splitting the two-way chart coordinated with a pair of left singular vectors factorized from the gene expression matrices of the two biological categories. Importantly, the cutoffs are determined by a data-driven algorithm using the well-defined statistic, SVD-p. The implementation was illustrated on two time series microarray data sets generated from the samples of accessory gland (ACG) and malpighian tubule (MT) tissues of the line W¹¹⁸of M. drosophila. Two conserved modules and six divergent modules, each of which has a unique characteristic profile across tissue kinds and aging processes, were identified. The number of genes contained in these models ranged from five to a few hundred. Three to over a hundred GO terms were over-represented in individual modules with FDR < 0.1. One divergent module suggested the tissue-specific relationship between the expressions of mitochondrion-related genes and the aging process. This finding, together with others, may be of biological significance. The validity of the proposed SVD-based method was further verified by a simulation study, as well as the comparisons with regression analysis and cubic spline regression analysis plus PAM based clustering.

Conclusions

svdPPCS is a novel computational tool for the comparative analysis of transcriptional profiling. It especially fits the comparison of time series data of related organisms or different tissues of the same organism under equivalent or similar experimental conditions. The general scheme can be directly extended to the comparisons of multiple data sets. It also can be applied to the integration of data sets from different platforms and of different sources.

Collapse

Zhu D. Semi-supervised gene shaving method for predicting low variation biological pathways from genome-wide data. BMC Bioinformatics 2009;10 Suppl 1:S54. [PMID: 19208157 PMCID: PMC2648790 DOI: 10.1186/1471-2105-10-s1-s54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Liu Q, Zhang Y, Xu Y, Ye X. Fuzzy Kernel Clustering of RNA Secondary Structure Ensemble Using a Novel Similarity Metric. J Biomol Struct Dyn 2008;25:685-96. [DOI: 10.1080/07391102.2008.10507214] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics. Proc Natl Acad Sci U S A 2008;105:1454-9. [PMID: 18218781 DOI: 10.1073/pnas.0706983105] [Citation(s) in RCA: 196] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Wu H, Yuan M, Kaech SM, Halloran ME. A statistical analysis of memory CD8 T cell differentiation: An application of a hierarchical state space model to a short time course microarray experiment. Ann Appl Stat 2007. [DOI: 10.1214/07-aoas118] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Fujibuchi W, Kato T. Classification of heterogeneous microarray data by maximum entropy kernel. BMC Bioinformatics 2007;8:267. [PMID: 17651507 PMCID: PMC1994960 DOI: 10.1186/1471-2105-8-267] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2007] [Accepted: 07/26/2007] [Indexed: 11/10/2022] Open

Mamtani MR, Thakre TP, Kalkonde MY, Amin MA, Kalkonde YV, Amin AP, Kulkarni H. A simple method to combine multiple molecular biomarkers for dichotomous diagnostic classification. BMC Bioinformatics 2006;7:442. [PMID: 17032455 PMCID: PMC1618410 DOI: 10.1186/1471-2105-7-442] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2006] [Accepted: 10/10/2006] [Indexed: 11/29/2022] Open

Inoue LYT, Neira M, Nelson C, Gleave M, Etzioni R. Cluster-based network model for time-course gene expression data. Biostatistics 2006;8:507-25. [PMID: 16980695 DOI: 10.1093/biostatistics/kxl026] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Pascual-Montano A, Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Marqui RD. bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics 2006;7:366. [PMID: 16875499 PMCID: PMC1550731 DOI: 10.1186/1471-2105-7-366] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 07/28/2006] [Indexed: 12/02/2022] Open

Sen TZ, Kloczkowski A, Jernigan RL. Functional clustering of yeast proteins from the protein-protein interaction network. BMC Bioinformatics 2006;7:355. [PMID: 16863590 PMCID: PMC1557866 DOI: 10.1186/1471-2105-7-355] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Accepted: 07/24/2006] [Indexed: 12/27/2022] Open

Dabrowski M, Adach A, Aerts S, Moreau Y, Kaminska B. Identification of conserved modes of expression profiles during hippocampal development and neuronal differentiation in vitro. J Neurochem 2006;97 Suppl 1:87-91. [PMID: 16635255 DOI: 10.1111/j.1471-4159.2005.03537.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Roden JC, King BW, Trout D, Mortazavi A, Wold BJ, Hart CE. Mining gene expression data by interpreting principal components. BMC Bioinformatics 2006;7:194. [PMID: 16600052 PMCID: PMC1501050 DOI: 10.1186/1471-2105-7-194] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2005] [Accepted: 04/07/2006] [Indexed: 12/04/2022] Open

Abstract

Background

There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis.

Results

We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset). We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.). Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation.

Conclusion

We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It has proven especially valuable in instances where there are many diverse conditions (10's to hundreds of different tissues or cell types), a situation in which many clustering and ordering algorithms become problematic. This approach also shows promise in other topic domains such as multi-spectral imaging datasets.

Collapse

Carter GW, Rupp S, Fink GR, Galitski T. Disentangling information flow in the Ras-cAMP signaling network. Genome Res 2006;16:520-6. [PMID: 16533914 PMCID: PMC1457029 DOI: 10.1101/gr.4473506] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Hu J, Wright FA, Zou F. Estimation of Expression Indexes for Oligonucleotide Arrays Using the Singular Value Decomposition. J Am Stat Assoc 2006. [DOI: 10.1198/016214505000000989] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Cios KJ, Mamitsuka H, Nagashima T, Tadeusiewicz R. Computational intelligence in solving bioinformatics problems. Artif Intell Med 2005;35:1-8. [PMID: 16095889 DOI: 10.1016/j.artmed.2005.07.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Liang Y, Kelemen A. Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments. Funct Integr Genomics 2005;6:1-13. [PMID: 16292543 DOI: 10.1007/s10142-005-0006-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2005] [Revised: 06/22/2005] [Accepted: 08/16/2005] [Indexed: 10/25/2022]

Hand DJ, Heard NA. Finding groups in gene expression data. J Biomed Biotechnol 2005;2005:215-25. [PMID: 16046827 PMCID: PMC1184051 DOI: 10.1155/jbb.2005.215] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2004] [Revised: 08/24/2004] [Accepted: 08/24/2004] [Indexed: 11/18/2022] Open

Teramoto KI, Tada M, Tamoto E, Abe M, Kawakami A, Komuro K, Matsunaga A, Shindoh G, Takada M, Murakawa K, Kanai M, Kobayashi N, Fujiwara Y, Nishimura N, Shirata K, Takahishi T, Ishizu A, Ikeda H, Hamada JI, Kondo S, Katoh H, Moriuchi T, Yoshiki T. Prediction of lymphatic invasion/lymph node metastasis, recurrence, and survival in patients with gastric cancer by cDNA array-based expression profiling. J Surg Res 2005;124:225-36. [PMID: 15820252 DOI: 10.1016/j.jss.2004.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2004] [Indexed: 01/21/2023]

Liang Y, Tayo B, Cai X, Kelemen A. Differential and trajectory methods for time course gene expression data. Bioinformatics 2005;21:3009-16. [PMID: 15886280 PMCID: PMC2574001 DOI: 10.1093/bioinformatics/bti465] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

The issue of high dimensionality in microarray data has been, and remains, a hot topic in statistical and computational analysis. Efficient gene filtering and differentiation approaches can reduce the dimensions of data, help to remove redundant genes and noises, and highlight the most relevant genes that are major players in the development of certain diseases or the effect of drug treatment. The purpose of this study is to investigate the efficiency of parametric (including Bayesian and non-Bayesian, linear and non-linear), non-parametric and semi-parametric gene filtering methods through the application of time course microarray data from multiple sclerosis patients being treated with interferon-beta-1a. The analysis of variance with bootstrapping (parametric), class dispersion (semi-parametric) and Pareto (non-parametric) with permutation methods are presented and compared for filtering and finding differentially expressed genes. The Bayesian linear correlated model, the Bayesian non-linear model the and non-Bayesian mixed effects model with bootstrap were also developed to characterize the differential expression patterns. Furthermore, trajectory-clustering approaches were developed in order to investigate the dynamic patterns and inter-dependency of drug treatment effects on gene expression.

RESULTS

Results show that the presented methods performed significant differently but all were adequate in capturing a small number of the potentially relevant genes to the disease. The parametric method, such as the mixed model and two Bayesian approaches proved to be more conservative. This may because these methods are based on overall variation in expression across all time points. The semi-parametric (class dispersion) and non-parametric (Pareto) methods were appropriate in capturing variation in expression from time point to time point, thereby making them more suitable for investigating significant monotonic changes and trajectories of changes in gene expressions in time course microarray data. Also, the non-linear Bayesian model proved to be less conservative than linear Bayesian correlated growth models to filter out the redundant genes, although the linear model showed better fit than non-linear model (smaller DIC). We also report the trajectories of significant genes-since we have been able to isolate trajectories of genes whose regulations appear to be inter-dependent.

Collapse

Cavalieri D, De Filippo C. Bioinformatic methods for integrating whole-genome expression results into cellular networks. Drug Discov Today 2005;10:727-34. [PMID: 15896686 DOI: 10.1016/s1359-6446(05)03433-1] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Chiappetta P, Roubaud MC, Torrésani B. Blind Source Separation and the Analysis of Microarray Data. J Comput Biol 2004;11:1090-109. [PMID: 15662200 DOI: 10.1089/cmb.2004.11.1090] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics 2004;5:136. [PMID: 15450124 PMCID: PMC522806 DOI: 10.1186/1471-2105-5-136] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2004] [Accepted: 09/27/2004] [Indexed: 02/08/2023] Open

Challacombe JF, Rechtsteiner A, Gottardo R, Rocha LM, Browne EP, Shenk T, Altherr MR, Brettin TS. Evaluation of the host transcriptional response to human cytomegalovirus infection. Physiol Genomics 2004;18:51-62. [PMID: 15069167 DOI: 10.1152/physiolgenomics.00155.2003] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Simek K, Kimmel M. A note on estimation of dynamics of multiple gene expression based on singular value decomposition. Math Biosci 2003;182:183-99. [PMID: 12591624 DOI: 10.1016/s0025-5564(02)00185-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Hörnquist M, Hertz J, Wahde M. Effective dimensionality of large-scale expression data using principal component analysis. Biosystems 2002;65:147-56. [PMID: 12069725 DOI: 10.1016/s0303-2647(02)00011-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Fogolari F, Tessari S, Molinari H. Singular value decomposition analysis of protein sequence alignment score data. Proteins 2002;46:161-70. [PMID: 11807944 DOI: 10.1002/prot.10032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Landgrebe J, Wurst W, Welzl G. Permutation-validated principal components analysis of microarray data. Genome Biol 2002;3:RESEARCH0019. [PMID: 11983060 PMCID: PMC115254 DOI: 10.1186/gb-2002-3-4-research0019] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2001] [Revised: 01/31/2002] [Accepted: 02/15/2002] [Indexed: 11/11/2022] Open

Abstract

BACKGROUND

In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure.

RESULTS

We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes.

CONCLUSIONS

Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

Collapse

Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447222 DOI: 10.1002/cfg.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open