1
|
Qiao W, Xie T, Lu J, Jia T. Development of machine learning models for the prediction of the skin sensitization potential of cosmetic compounds. PeerJ 2024; 12:e18672. [PMID: 39686995 PMCID: PMC11648681 DOI: 10.7717/peerj.18672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024] Open
Abstract
Background To enhance the accuracy of allergen detection in cosmetic compounds, we developed a co-culture system that combines HaCaT keratinocytes (transfected with a luciferase plasmid driven by the AKR1C2 promoter) and THP-1 cells for machine learning applications. Methods Following chemical exposure, cell cytotoxicity was assessed using CCK-8 to determine appropriate stimulation concentrations. RNA-Seq was subsequently employed to analyze THP-1 cells, followed by differential expression gene (DEG) analysis and weighted gene co-expression net-work analysis (WGCNA). Using two data preprocessing methods and three feature extraction techniques, we constructed and validated models with eight machine learning algorithms. Results Our results demonstrated the effectiveness of this integrated approach. The best performing models were random forest (RF) and voom-based diagonal quadratic discriminant analysis (voomDQDA), both achieving 100% accuracy. Support vector machine (SVM) and voom based nearest shrunken centroids (voomNSC) showed excellent performance with 96.7% test accuracy, followed by voom-based diagonal linear discriminant analysis (voomDLDA) at 95.2%. Nearest shrunken centroids (NSC), Poisson linear discriminant analysis (PLDA) and negative binomial linear discriminant analysis (NBLDA) achieved 90.5% and 90.2% accuracy, respectively. K-nearest neighbors (KNN) showed the lowest accuracy at 85.7%. Conclusion This study highlights the potential of integrating co-culture systems, RNA-Seq, and machine learning to develop more accurate and comprehensive in vitro methods for skin sensitization testing. Our findings contribute to the advancement of cosmetic safety assessments, potentially reducing the reliance on animal testing.
Collapse
Affiliation(s)
- Wu Qiao
- Pigeon Manufacturing (Shanghai) Co., Ltd., Shanghai, China
| | - Tong Xie
- Pigeon Manufacturing (Shanghai) Co., Ltd., Shanghai, China
| | - Jing Lu
- Pigeon Manufacturing (Shanghai) Co., Ltd., Shanghai, China
| | - Tinghan Jia
- Pigeon Manufacturing (Shanghai) Co., Ltd., Shanghai, China
| |
Collapse
|
2
|
Huang S, Pang L, Wei C. Identification of a Four-Gene Signature With Prognostic Significance in Endometrial Cancer Using Weighted-Gene Correlation Network Analysis. Front Genet 2021; 12:678780. [PMID: 34616422 PMCID: PMC8488359 DOI: 10.3389/fgene.2021.678780] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 08/30/2021] [Indexed: 01/01/2023] Open
Abstract
Endometrial hyperplasia (EH) is a precursor for endometrial cancer (EC). However, biomarkers for the progression from EH to EC and standard prognostic biomarkers for EC have not been identified. In this study, we aimed to identify key genes with prognostic significance for the progression from EH to EC. Weighted-gene correlation network analysis (WGCNA) was used to identify hub genes utilizing microarray data (GSE106191) downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) were identified from the Uterine Corpus Endometrial Carcinoma (UCEC) dataset of The Cancer Genome Atlas database. The Limma-Voom R package was applied to detect differentially expressed genes (DEGs; mRNAs) between cancer and normal samples. Genes with |log2 (fold change [FC])| > 1.0 and p < 0.05 were considered as DEGs. Univariate and multivariate Cox regression and survival analyses were performed to identify potential prognostic genes using hub genes overlapping in the two datasets. All analyses were conducted using R Bioconductor and related packages. Through WGCNA and overlapping genes in hub modules with DEGs in the UCEC dataset, we identified 42 hub genes. The results of the univariate and multivariate Cox regression analyses revealed that four hub genes, BUB1B, NDC80, TPX2, and TTK, were independently associated with the prognosis of EC (Hazard ratio [95% confidence interval]: 0.591 [0.382–0.912], p = 0.017; 0.605 [0.371–0.986], p = 0.044; 1.678 [1.132–2.488], p = 0.01; 2.428 [1.372–4.29], p = 0.02, respectively). A nomogram was established with a risk score calculated using the four genes’ coefficients in the multivariate analysis, and tumor grade and stage had a favorable predictive value for the prognosis of EC. The survival analysis showed that the high-risk group had an unfavorable prognosis compared with the low-risk group (p < 0.0001). The receiver operating characteristic curves also indicated that the risk model had a potential predictive value of prognosis with area under the curve 0.807 at 2 years, 0.783 at 3 years, and 0.786 at 5 years. We established a four-gene signature with prognostic significance in EC using WGCNA and established a nomogram to predict the prognosis of EC.
Collapse
Affiliation(s)
- Shijin Huang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China
| | - Lihong Pang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China
| | - Changqiang Wei
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China
| |
Collapse
|
3
|
Anene CA, Khan F, Bewicke-Copley F, Maniati E, Wang J. ACSNI: An unsupervised machine-learning tool for prediction of tissue-specific pathway components using gene expression profiles. PATTERNS (NEW YORK, N.Y.) 2021; 2:100270. [PMID: 34179848 PMCID: PMC8212143 DOI: 10.1016/j.patter.2021.100270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/10/2021] [Accepted: 04/28/2021] [Indexed: 11/01/2022]
Abstract
Determining the tissue- and disease-specific circuit of biological pathways remains a fundamental goal of molecular biology. Many components of these biological pathways still remain unknown, hindering the full and accurate characterization of biological processes of interest. Here we describe ACSNI, an algorithm that combines prior knowledge of biological processes with a deep neural network to effectively decompose gene expression profiles (GEPs) into multi-variable pathway activities and identify unknown pathway components. Experiments on public GEP data show that ACSNI predicts cogent components of mTOR, ATF2, and HOTAIRM1 signaling that recapitulate regulatory information from genetic perturbation and transcription factor binding datasets. Our framework provides a fast and easy-to-use method to identify components of signaling pathways as a tool for molecular mechanism discovery and to prioritize genes for designing future targeted experiments (https://github.com/caanene1/ACSNI).
Collapse
Affiliation(s)
- Chinedu Anthony Anene
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Faraz Khan
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Findlay Bewicke-Copley
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Eleni Maniati
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Jun Wang
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| |
Collapse
|
4
|
Zhao T, Khadka VS, Deng Y. Identification of lncRNA biomarkers for lung cancer through integrative cross-platform data analyses. Aging (Albany NY) 2020; 12:14506-14527. [PMID: 32675385 PMCID: PMC7425463 DOI: 10.18632/aging.103496] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 06/01/2020] [Indexed: 02/07/2023]
Abstract
This study was designed to identify lncRNA biomarker candidates using lung cancer data from RNA-Seq and microarray platforms separately.Lung cancer datasets were obtained from the Gene Expression Omnibus (GEO, n = 287) and The Cancer Genome Atlas (TCGA, n = 216) repositories, only common lncRNAs were used. Differentially expressed (DE) lncRNAs in tumors with respect to normal were selected from the Affymetrix and TCGA datasets. A training model consisting of the top 20 DE Affymetrix lncRNAs was used for validation in the TCGA and Agilent datasets. A second similar training model was generated using the TCGA dataset.First, a model using the top 20 DE lncRNAs from Affymetrix for training and validated using TCGA and Agilent, achieved high prediction accuracy for both training (98.5% AUC for Affymetrix) and validation (99.2% AUC for TCGA and 92.8% AUC for Agilent). A similar model using the top 20 DE lncRNAs from TCGA for training and validated using Affymetrix and Agilent, also achieved high prediction accuracy for both training (97.7% AUC for TCGA) and validation (96.5% AUC for Affymetrix and 80.9% AUC for Agilent). Eight lncRNAs were found to be overlapped from these two lists.
Collapse
Affiliation(s)
- Tianying Zhao
- Department of Quantitative Health Sciences, University of Hawaii John A. Burns School of Medicine, The University of Hawaii at Manoa, Honolulu, HI 96813, USA
- Department of Molecular Biosciences and Bioengineering, The University of Hawaii at Manoa College of Tropical Agriculture and Human Resources, Agricultural Sciences 218, Honolulu, HI 96822, USA
| | - Vedbar Singh Khadka
- Department of Quantitative Health Sciences, University of Hawaii John A. Burns School of Medicine, The University of Hawaii at Manoa, Honolulu, HI 96813, USA
| | - Youping Deng
- Department of Quantitative Health Sciences, University of Hawaii John A. Burns School of Medicine, The University of Hawaii at Manoa, Honolulu, HI 96813, USA
| |
Collapse
|
5
|
Kimmel JC, Hwang AB, Scaramozza A, Marshall WF, Brack AS. Aging induces aberrant state transition kinetics in murine muscle stem cells. Development 2020; 147:dev183855. [PMID: 32198156 PMCID: PMC7225128 DOI: 10.1242/dev.183855] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 02/17/2020] [Indexed: 12/12/2022]
Abstract
Murine muscle stem cells (MuSCs) experience a transition from quiescence to activation that is required for regeneration, but it remains unknown if the trajectory and dynamics of activation change with age. Here, we use time-lapse imaging and single cell RNA-seq to measure activation trajectories and rates in young and aged MuSCs. We find that the activation trajectory is conserved in aged cells, and we develop effective machine-learning classifiers for cell age. Using cell-behavior analysis and RNA velocity, we find that activation kinetics are delayed in aged MuSCs, suggesting that changes in stem cell dynamics may contribute to impaired stem cell function with age. Intriguingly, we also find that stem cell activation appears to be a random walk-like process, with frequent reversals, rather than a continuous linear progression. These results support a view of the aged stem cell phenotype as a combination of differences in the location of stable cell states and differences in transition rates between them.
Collapse
Affiliation(s)
- Jacob C Kimmel
- Eli and Edythe Broad Center for Regenerative Medicine, University of California, San Francisco, 35 Medical Center Way, San Francisco, CA 94143, USA
- Center for Cellular Construction, University of California, San Francisco, San Francisco, CA 94143, USA
- Biochemistry & Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Ara B Hwang
- Eli and Edythe Broad Center for Regenerative Medicine, University of California, San Francisco, 35 Medical Center Way, San Francisco, CA 94143, USA
| | - Annarita Scaramozza
- Eli and Edythe Broad Center for Regenerative Medicine, University of California, San Francisco, 35 Medical Center Way, San Francisco, CA 94143, USA
| | - Wallace F Marshall
- Center for Cellular Construction, University of California, San Francisco, San Francisco, CA 94143, USA
- Biochemistry & Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Andrew S Brack
- Eli and Edythe Broad Center for Regenerative Medicine, University of California, San Francisco, 35 Medical Center Way, San Francisco, CA 94143, USA
| |
Collapse
|
6
|
Kimmel JC, Penland L, Rubinstein ND, Hendrickson DG, Kelley DR, Rosenthal AZ. Murine single-cell RNA-seq reveals cell-identity- and tissue-specific trajectories of aging. Genome Res 2019; 29:2088-2103. [PMID: 31754020 PMCID: PMC6886498 DOI: 10.1101/gr.253880.119] [Citation(s) in RCA: 108] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 10/21/2019] [Indexed: 01/08/2023]
Abstract
Aging is a pleiotropic process affecting many aspects of mammalian physiology. Mammals are composed of distinct cell type identities and tissue environments, but the influence of these cell identities and environments on the trajectory of aging in individual cells remains unclear. Here, we performed single-cell RNA-seq on >50,000 individual cells across three tissues in young and old mice to allow for direct comparison of aging phenotypes across cell types. We found transcriptional features of aging common across many cell types, as well as features of aging unique to each type. Leveraging matrix factorization and optimal transport methods, we found that both cell identities and tissue environments exert influence on the trajectory and magnitude of aging, with cell identity influence predominating. These results suggest that aging manifests with unique directionality and magnitude across the diverse cell identities in mammals.
Collapse
Affiliation(s)
- Jacob C Kimmel
- Calico Life Sciences, South San Francisco, California 94080, USA
| | - Lolita Penland
- Calico Life Sciences, South San Francisco, California 94080, USA
| | | | | | - David R Kelley
- Calico Life Sciences, South San Francisco, California 94080, USA
| | - Adam Z Rosenthal
- Calico Life Sciences, South San Francisco, California 94080, USA
| |
Collapse
|
7
|
Hassan YI, He JW, Lepp D, Zhou T. Understanding the Bacterial Response to Mycotoxins: The Transcriptomic Analysis of Deoxynivalenol-Induced Changes in Devosia mutans 17-2-E-8. Front Pharmacol 2019; 10:1098. [PMID: 31798443 PMCID: PMC6868067 DOI: 10.3389/fphar.2019.01098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 08/26/2019] [Indexed: 12/17/2022] Open
Abstract
Deoxynivalenol (DON) is a major fusarium toxin widely detected in cereal grains. The inadvertent exposure to this fungal secondary-metabolite gives rise to a myriad of adverse health effects including appetite loss, emesis, and suppression of the immune system. While most of the attention this mycotoxin has gained in the past four decades was related to its eukaryotic toxicity (monogastric animals and plants more precisely), recent studies have begun to reveal its negative influence on prokaryotes. Recently presented evidence indicates that DON can negatively affect many bacterial species, raising the possibility of DON-induced imbalances within the microbiota of the human and animal gut, in addition to other environmental niches. This in turn has led to a greater interest in understanding bacterial responses toward DON, and the involved mechanism(s) and metabolic pathways, in order to build a more comprehensive picture of DON-induced changes in both prokaryotes and eukaryotes alike. This study reveals the transcriptomic profiling of Devosia mutans strain 17-2-E-8 after the inclusion of DON within its growth medium. The results highlight three adaptive mechanisms involved in the response of D. mutans 17-2-E-8 to this mycotoxin, which include: (a) activation of adenosine 5’-triphosphate-binding cassette transporters; (b) engagement of a toxin-specific pyrroloquinoline quinone-dependent detoxification pathway; and finally (c) the upregulation of auxiliary coping proteins such as porins, glutathione S-transferases, and phosphotransferases. Some of the identified mechanisms are universal in nature and are shared with other bacterial genera and species.
Collapse
Affiliation(s)
- Yousef I Hassan
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON, Canada
| | - Jian Wei He
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON, Canada
| | - Dion Lepp
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON, Canada
| | - Ting Zhou
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON, Canada
| |
Collapse
|
8
|
Rivas MJ, Saura M, Pérez-Figueroa A, Panova M, Johansson T, André C, Caballero A, Rolán-Alvarez E, Johannesson K, Quesada H. Population genomics of parallel evolution in gene expression and gene sequence during ecological adaptation. Sci Rep 2018; 8:16147. [PMID: 30385764 PMCID: PMC6212547 DOI: 10.1038/s41598-018-33897-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 10/08/2018] [Indexed: 11/17/2022] Open
Abstract
Natural selection often produces parallel phenotypic changes in response to a similar adaptive challenge. However, the extent to which parallel gene expression differences and genomic divergence underlie parallel phenotypic traits and whether they are decoupled or not remains largely unexplored. We performed a population genomic study of parallel ecological adaptation among replicate ecotype pairs of the rough periwinkle (Littorina saxatilis) at a regional geographical scale (NW Spain). We show that genomic changes underlying parallel phenotypic divergence followed a complex pattern of both repeatable differences and of differences unique to specific ecotype pairs, in which parallel changes in expression or sequence are restricted to a limited set of genes. Yet, the majority of divergent genes were divergent either for gene expression or coding sequence, but not for both simultaneously. Overall, our findings suggest that divergent selection significantly contributed to the process of parallel molecular differentiation among ecotype pairs, and that changes in expression and gene sequence underlying phenotypic divergence could, at least to a certain extent, be considered decoupled processes.
Collapse
Affiliation(s)
- María José Rivas
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310, Vigo, Spain
| | - María Saura
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310, Vigo, Spain
| | - Andrés Pérez-Figueroa
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310, Vigo, Spain
| | - Marina Panova
- Department of Marine Sciences, Tjärnö, University of Gothenburg, SE-452 96, Strömstad, Sweden
| | - Tomas Johansson
- Department of Biology, University of Lund, SE-223 62, Lund, Sweden
| | - Carl André
- Department of Marine Sciences, Tjärnö, University of Gothenburg, SE-452 96, Strömstad, Sweden
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310, Vigo, Spain
| | - Emilio Rolán-Alvarez
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310, Vigo, Spain
| | - Kerstin Johannesson
- Department of Marine Sciences, Tjärnö, University of Gothenburg, SE-452 96, Strömstad, Sweden
| | - Humberto Quesada
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310, Vigo, Spain.
| |
Collapse
|
9
|
Mohorianu I, Bretman A, Smith DT, Fowler EK, Dalmay T, Chapman T. Comparison of alternative approaches for analysing multi-level RNA-seq data. PLoS One 2017; 12:e0182694. [PMID: 28792517 PMCID: PMC5549751 DOI: 10.1371/journal.pone.0182694] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 07/21/2017] [Indexed: 11/19/2022] Open
Abstract
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.
Collapse
Affiliation(s)
- Irina Mohorianu
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
- School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Amanda Bretman
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
- School of Biology, University of Leeds, Leeds, LS2 9JT, United Kingdom
| | - Damian T. Smith
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Emily K. Fowler
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Tamas Dalmay
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Tracey Chapman
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
- * E-mail:
| |
Collapse
|
10
|
Ma T, Liang F, Oesterreich S, Tseng GC. A Joint Bayesian Model for Integrating Microarray and RNA Sequencing Transcriptomic Data. J Comput Biol 2017; 24:647-662. [PMID: 28541721 PMCID: PMC5510692 DOI: 10.1089/cmb.2017.0056] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
As the sequencing cost continued to drop in the past decade, RNA sequencing (RNA-seq) has replaced microarray to become the standard high-throughput experimental tool to analyze transcriptomic profile. As more and more datasets are generated and accumulated in the public domain, meta-analysis to combine multiple transcriptomic studies to increase statistical power has received increasing popularity. In this article, we propose a Bayesian hierarchical model to jointly integrate microarray and RNA-seq studies. Since systematic fold change differences across RNA-seq and microarray for detecting differentially expressed genes have been previously reported, we replicated this finding in several real datasets and showed that incorporation of a normalization procedure to account for the bias improves the detection accuracy and power. We compared our method with the popular two-stage Fisher's method using simulations and two real applications in a histological subtype (invasive lobular carcinoma) of breast cancer comparing PR+ versus PR- and early-stage versus late-stage patients. The result showed improved detection power and more significant and interpretable pathways enriched in the detected biomarkers from the proposed Bayesian model.
Collapse
Affiliation(s)
- Tianzhou Ma
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Faming Liang
- Department of Biostatistics, University of Florida, Gainesville, Florida
| | - Steffi Oesterreich
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania
- Women's Cancer Research Center, Pittsburgh, Pennsylvania
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Computational Biology, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
11
|
Hackett SR, Zanotelli VRT, Xu W, Goya J, Park JO, Perlman DH, Gibney PA, Botstein D, Storey JD, Rabinowitz JD. Systems-level analysis of mechanisms regulating yeast metabolic flux. Science 2016; 354:aaf2786. [PMID: 27789812 PMCID: PMC5414049 DOI: 10.1126/science.aaf2786] [Citation(s) in RCA: 195] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 09/23/2016] [Indexed: 07/25/2023]
Abstract
Cellular metabolic fluxes are determined by enzyme activities and metabolite abundances. Biochemical approaches reveal the impact of specific substrates or regulators on enzyme kinetics but do not capture the extent to which metabolite and enzyme concentrations vary across physiological states and, therefore, how cellular reactions are regulated. We measured enzyme and metabolite concentrations and metabolic fluxes across 25 steady-state yeast cultures. We then assessed the extent to which flux can be explained by a Michaelis-Menten relationship between enzyme, substrate, product, and potential regulator concentrations. This revealed three previously unrecognized instances of cross-pathway regulation, which we biochemically verified. One of these involved inhibition of pyruvate kinase by citrate, which accumulated and thereby curtailed glycolytic outflow in nitrogen-limited yeast. Overall, substrate concentrations were the strongest driver of the net rates of cellular metabolic reactions, with metabolite concentrations collectively having more than double the physiological impact of enzymes.
Collapse
Affiliation(s)
- Sean R Hackett
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | | | - Wenxin Xu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA. Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Jonathan Goya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Junyoung O Park
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - David H Perlman
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Patrick A Gibney
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA. Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - David Botstein
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA. Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - John D Storey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA. Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA. Center for Statistics and Machine Learning, Princeton University, Princeton, NJ 08544, USA
| | - Joshua D Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA. Department of Chemistry, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
12
|
Izadi F, Zarrini HN, Kiani G, Jelodar NB. A comparative analytical assay of gene regulatory networks inferred using microarray and RNA-seq datasets. Bioinformation 2016; 12:340-346. [PMID: 28293077 PMCID: PMC5320930 DOI: 10.6026/97320630012340] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 08/05/2016] [Accepted: 08/06/2016] [Indexed: 01/16/2023] Open
Abstract
A Gene Regulatory Network (GRN) is a collection of interactions between molecular regulators and their targets in cells governing gene expression level. Omics data explosion generated from high-throughput genomic assays such as microarray and RNA-Seq technologies and the emergence of a number of pre-processing methods demands suitable guidelines to determine the impact of transcript data platforms and normalization procedures on describing associations in GRNs. In this study exploiting publically available microarray and RNA-Seq datasets and a gold standard of transcriptional interactions in Arabidopsis, we performed a comparison between six GRNs derived by RNA-Seq and microarray data and different normalization procedures. As a result we observed that compared algorithms were highly data-specific and Networks reconstructed by RNA-Seq data revealed a considerable accuracy against corresponding networks captured by microarrays. Topological analysis showed that GRNs inferred from two platforms were similar in several of topological features although we observed more connectivity in RNA-Seq derived genes network. Taken together transcriptional regulatory networks obtained by Robust Multiarray Averaging (RMA) and Variance-Stabilizing Transformed (VST) normalized data demonstrated predicting higher rate of true edges over the rest of methods used in this comparison.
Collapse
Affiliation(s)
- Fereshteh Izadi
- Plant Breeding Department, Sari Agricultural Sciences and Natural Resources, Iran
| | - Hamid Najafi Zarrini
- Plant Breeding Department, Sari Agricultural Sciences and Natural Resources, Iran
| | - Ghaffar Kiani
- Plant Breeding Department, Sari Agricultural Sciences and Natural Resources, Iran
| | | |
Collapse
|
13
|
Shukla S, Evans JR, Malik R, Feng FY, Dhanasekaran SM, Cao X, Chen G, Beer DG, Jiang H, Chinnaiyan AM. Development of a RNA-Seq Based Prognostic Signature in Lung Adenocarcinoma. J Natl Cancer Inst 2016; 109:2905970. [PMID: 27707839 DOI: 10.1093/jnci/djw200] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 08/02/2016] [Indexed: 01/08/2023] Open
Abstract
Background Precision therapy for lung cancer will require comprehensive genomic testing to identify actionable targets as well as ascertain disease prognosis. RNA-seq is a robust platform that meets these requirements, but microarray-derived prognostic signatures are not optimal for RNA-seq data. Thus, we undertook the first prognostic analysis of lung adenocarcinoma RNA-seq data and generated a prognostic signature. Methods Lung adenocarcinoma RNA-seq and clinical data from The Cancer Genome Atlas (TCGA) were divided chronologically into training (n = 255) and validation (n = 157) cohorts. In the training cohort, prognostic association was assessed by univariate Cox analysis. A prognostic signature was built with stepwise multivariable Cox analysis. Outcomes by risk group, stage, and mutation status were analyzed with Kaplan-Meier and multivariable Cox analyses. All the statistical tests were two-sided. Results In the training cohort, 96 genes had prognostic association with P values of less than or equal to 1.00x10-4, including five long noncoding RNAs (lncRNAs). Stepwise regression generated a four-gene signature, including one lncRNA. Signature high-risk cases had worse overall survival (OS) in the TCGA validation cohort (hazard ratio [HR] = 3.07, 95% confidence interval [CI] = 2.00 to 14.62) and a University of Michigan institutional cohort (n = 67; HR = 2.05, 95% CI = 1.18 to 4.55), and worse metastasis-free survival in the TCGA validation cohort (HR = 3.05, 95% CI = 2.31 to 13.37). The four-gene prognostic signature also statistically significantly stratified overall survival in important clinical subsets, including stage I (HR = 2.78, 95% CI = 1.91 to 11.13), EGFR wild-type (HR = 3.01, 95% CI = 1.73 to 14.98), and EGFR mutant (HR = 8.99, 95% CI = 62.23 to 141.44). The four-gene prognostic signature also stood out on top when compared with other prognostic signatures. Conclusions Here, we present the first RNA-seq prognostic signature for lung adenocarcinoma that can provide a powerful prognostic tool for precision oncology as part of an integrated RNA-seq clinical sequencing program.
Collapse
Affiliation(s)
- Sudhanshu Shukla
- Department of Pathology, University of Michigan, Ann Arbor, MI.,Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI
| | - Joseph R Evans
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI
| | - Rohit Malik
- Department of Pathology, University of Michigan, Ann Arbor, MI.,Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI
| | - Felix Y Feng
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI.,Department of Radiation Oncology, University of Michigan, Ann Arbor, MI.,Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI
| | - Saravana M Dhanasekaran
- Department of Pathology, University of Michigan, Ann Arbor, MI.,Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI
| | - Xuhong Cao
- Department of Pathology, University of Michigan, Ann Arbor, MI.,Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI
| | - Guoan Chen
- Department of Surgery, Section of Thoracic Surgery, University of Michigan, Ann Arbor, MI
| | - David G Beer
- Department of Surgery, Section of Thoracic Surgery, University of Michigan, Ann Arbor, MI
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
| | - Arul M Chinnaiyan
- Department of Pathology, University of Michigan, Ann Arbor, MI.,Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI.,Department of Biostatistics, University of Michigan, Ann Arbor, MI.,Howard Hughes Medical Institute, University of Michigan, Ann Arbor, MI
| |
Collapse
|
14
|
Zhou C, Wang M, Zhou L, Zhang Y, Liu W, Qin W, He R, Lu Y, Wang Y, Chen XZ, Tang J. Prognostic significance of PLIN1 expression in human breast cancer. Oncotarget 2016; 7:54488-54502. [PMID: 27359054 PMCID: PMC5342357 DOI: 10.18632/oncotarget.10239] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/13/2016] [Indexed: 12/18/2022] Open
Abstract
Breast cancer is a heterogeneous disease associated with diverse clinical, biological and molecular features, presenting huge challenges for prognosis and treatment. Here we found that perilipin-1 (PLIN1) mRNA expression is significantly downregulated in human breast cancer. Kaplan-Meier analysis indicated that patients presenting with reduced PLIN1 expression exhibited poorer overall metastatic relapse-free survival (p = 0.03). Further Cox proportional hazard models analysis revealed that the reduced expression of PLIN1 is an independent predictor of overall survival in estrogen receptor positive (p < 0.0001, HR = 0.87, 95% CI = 0.81-0.92, N = 3,600) and luminal A-subtype (p = 0.02, HR = 0.88, 95% CI = 0.78-0.98, N = 1,469) breast cancer patients. We also demonstrated that the exogenous expression of PLIN1 in human breast cancer MCF-7 and MDA-MB-231 cells significantly inhibits cell proliferation, migration, invasion and in vivo tumorigenesis in mice. Together, these data provide novel insights into a prognostic significance of PLIN1 in human breast cancer and reveal a potentially new gene therapy target for breast cancer.
Collapse
Affiliation(s)
- Cefan Zhou
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
- The State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan, Hubei, China
| | - Ming Wang
- Department of Clinical Laboratory, Renmin Hospital of Wuhan University, Wuhan, Hubei, China
| | - Li Zhou
- Animal Biosafety Level III Laboratory at the Center for Animal Experiment, Wuhan University, Wuhan, China
| | - Yi Zhang
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
| | - Weiyong Liu
- Department of Clinical Laboratory, Tongji Hospital, Tongji Medical College of Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Wenying Qin
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
| | - Rong He
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
| | - Yang Lu
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
| | - Yefu Wang
- The State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan, Hubei, China
| | - Xing-Zhen Chen
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
- Membrane Protein Disease Research Group, Department of Physiology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Jingfeng Tang
- Institute of Biomedical and Pharmaceutical Sciences, and Provincial Cooperative Innovation Center, College of Bioengineering, Hubei University of Technology, Wuhan, Hubei, China
| |
Collapse
|
15
|
Scarpa JR, Jiang P, Losic B, Readhead B, Gao VD, Dudley JT, Vitaterna MH, Turek FW, Kasarskis A. Systems Genetic Analyses Highlight a TGFβ-FOXO3 Dependent Striatal Astrocyte Network Conserved across Species and Associated with Stress, Sleep, and Huntington's Disease. PLoS Genet 2016; 12:e1006137. [PMID: 27390852 PMCID: PMC4938493 DOI: 10.1371/journal.pgen.1006137] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 05/31/2016] [Indexed: 12/22/2022] Open
Abstract
Recent systems-based analyses have demonstrated that sleep and stress traits emerge from shared genetic and transcriptional networks, and clinical work has elucidated the emergence of sleep dysfunction and stress susceptibility as early symptoms of Huntington's disease. Understanding the biological bases of these early non-motor symptoms may reveal therapeutic targets that prevent disease onset or slow disease progression, but the molecular mechanisms underlying this complex clinical presentation remain largely unknown. In the present work, we specifically examine the relationship between these psychiatric traits and Huntington's disease (HD) by identifying striatal transcriptional networks shared by HD, stress, and sleep phenotypes. First, we utilize a systems-based approach to examine a large publicly available human transcriptomic dataset for HD (GSE3790 from GEO) in a novel way. We use weighted gene coexpression network analysis and differential connectivity analyses to identify transcriptional networks dysregulated in HD, and we use an unbiased ranking scheme that leverages both gene- and network-level information to identify a novel astrocyte-specific network as most relevant to HD caudate. We validate this result in an independent HD cohort. Next, we computationally predict FOXO3 as a regulator of this network, and use multiple publicly available in vitro and in vivo experimental datasets to validate that this astrocyte HD network is downstream of a signaling pathway important in adult neurogenesis (TGFβ-FOXO3). We also map this HD-relevant caudate subnetwork to striatal transcriptional networks in a large (n = 100) chronically stressed (B6xA/J)F2 mouse population that has been extensively phenotyped (328 stress- and sleep-related measurements), and we show that this striatal astrocyte network is correlated to sleep and stress traits, many of which are known to be altered in HD cohorts. We identify causal regulators of this network through Bayesian network analysis, and we highlight their relevance to motor, mood, and sleep traits through multiple in silico approaches, including an examination of their protein binding partners. Finally, we show that these causal regulators may be therapeutically viable for HD because their downstream network was partially modulated by deep brain stimulation of the subthalamic nucleus, a medical intervention thought to confer some therapeutic benefit to HD patients. In conclusion, we show that an astrocyte transcriptional network is primarily associated to HD in the caudate and provide evidence for its relationship to molecular mechanisms of neural stem cell homeostasis. Furthermore, we present a unified systems-based framework for identifying gene networks that are associated with complex non-motor traits that manifest in the earliest phases of HD. By analyzing and integrating multiple independent datasets, we identify a point of molecular convergence between sleep, stress, and HD that reflects their phenotypic comorbidity and reveals a molecular pathway involved in HD progression.
Collapse
Affiliation(s)
- Joseph R. Scarpa
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Peng Jiang
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, Illinois, United States of America
| | - Bojan Losic
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Ben Readhead
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Vance D. Gao
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, Illinois, United States of America
| | - Joel T. Dudley
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Martha H. Vitaterna
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, Illinois, United States of America
| | - Fred W. Turek
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, Illinois, United States of America
| | - Andrew Kasarskis
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| |
Collapse
|
16
|
Gallagher IJ, Jacobi C, Tardif N, Rooyackers O, Fearon K. Omics/systems biology and cancer cachexia. Semin Cell Dev Biol 2016; 54:92-103. [DOI: 10.1016/j.semcdb.2015.12.022] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 12/30/2015] [Indexed: 10/22/2022]
|
17
|
Brereton NJB, Gonzalez E, Marleau J, Nissim WG, Labrecque M, Joly S, Pitre FE. Comparative Transcriptomic Approaches Exploring Contamination Stress Tolerance in Salix sp. Reveal the Importance for a Metaorganismal de Novo Assembly Approach for Nonmodel Plants. PLANT PHYSIOLOGY 2016; 171:3-24. [PMID: 27002060 PMCID: PMC4854704 DOI: 10.1104/pp.16.00090] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 03/20/2016] [Indexed: 05/09/2023]
Abstract
Metatranscriptomic study of nonmodel organisms requires strategies that retain the highly resolved genetic information generated from model organisms while allowing for identification of the unexpected. A real-world biological application of phytoremediation, the field growth of 10 Salix cultivars on polluted soils, was used as an exemplar nonmodel and multifaceted crop response well-disposed to the study of gene expression. Sequence reads were assembled de novo to create 10 independent transcriptomes, a global transcriptome, and were mapped against the Salix purpurea 94006 reference genome. Annotation of assembled contigs was performed without a priori assumption of the originating organism. Global transcriptome construction from 3.03 billion paired-end reads revealed 606,880 unique contigs annotated from 1588 species, often common in all 10 cultivars. Comparisons between transcriptomic and metatranscriptomic methodologies provide clear evidence that nonnative RNA can mistakenly map to reference genomes, especially to conserved regions of common housekeeping genes, such as actin, α/β-tubulin, and elongation factor 1-α. In Salix, Rubisco activase transcripts were down-regulated in contaminated trees across all 10 cultivars, whereas thiamine thizole synthase and CP12, a Calvin Cycle master regulator, were uniformly up-regulated. De novo assembly approaches, with unconstrained annotation, can improve data quality; care should be taken when exploring such plant genetics to reduce de facto data exclusion by mapping to a single reference genome alone. Salix gene expression patterns strongly suggest cultivar-wide alteration of specific photosynthetic apparatus and protection of the antenna complexes from oxidation damage in contaminated trees, providing an insight into common stress tolerance strategies in a real-world phytoremediation system.
Collapse
Affiliation(s)
- Nicholas J B Brereton
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| | - Emmanuel Gonzalez
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| | - Julie Marleau
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| | - Werther Guidi Nissim
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| | - Michel Labrecque
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| | - Simon Joly
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| | - Frederic E Pitre
- Institut de recherche en biologie végétale, University of Montreal, Montreal QC H1X 2B2, Canada (N.J.B.B., E.G., J.M., M.L., S.J., F.E.P.); andMontreal Botanical Garden, Montreal, QC H1X 2B2, Canada (W.G.N., M.L., S.J., F.E.P.)
| |
Collapse
|
18
|
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge. PLoS One 2015; 10:e0143196. [PMID: 26575370 PMCID: PMC4648502 DOI: 10.1371/journal.pone.0143196] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 11/01/2015] [Indexed: 01/08/2023] Open
Abstract
Method Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. Results I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.
Collapse
|