1
|
Pravalphruekul N, Piriyajitakonkij M, Phunchongharn P, Piyayotai S. De Novo Design of Molecules with Multiaction Potential from Differential Gene Expression using Variational Autoencoder. J Chem Inf Model 2023; 63:3999-4011. [PMID: 37347587 DOI: 10.1021/acs.jcim.3c00355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
The modulating effect of chemical compounds and therapeutics on gene transcription is well-reported and has been intensively studied for both clinical and research purposes. Emerging research points toward the utility of drug-induced transcriptional alterations in de novo molecular design and highlights the idea of phenotype-matching an expression signature of interest to the structures being designed. In this work, we build an autoencoder-based generative model, BiCEV, around this concept. Our generative autoencoder has demonstrably generated a set of new molecules from gene expression input with notable validity (96%), uniqueness (98%), and internal diversity (0.77). Further, we attempted to validate BiCEV by testing the model on gene-knockdown profiles and combined signatures of synergistic drug pairs. From these investigations, we found the designed structures to be consistently high in collective quality. However, when their similarities to the supposed functional equivalents as determined by shared targets were considered, the findings were somewhat mixed. In spite of this, we believe the generative model merits further development in conjunction with in vitro corroboration to lend itself to being an assistive tool for drug discovery experts, particularly to support the initial stages of hit identification and lead optimization.
Collapse
Affiliation(s)
- Nutaya Pravalphruekul
- Department of Computer Engineering, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| | | | - Phond Phunchongharn
- Department of Computer Engineering, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
- Big Data Experience Center, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| | - Supanida Piyayotai
- Big Data Experience Center, Bang Mod, Thung Khru, Bangkok 10140, Thailand
- Learning Institute, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| |
Collapse
|
2
|
Pergola G, Parihar M, Sportelli L, Bharadwaj R, Borcuk C, Radulescu E, Bellantuono L, Blasi G, Chen Q, Kleinman JE, Wang Y, Sripathy SR, Maher BJ, Monaco A, Rossi F, Shin JH, Hyde TM, Bertolino A, Weinberger DR. Consensus molecular environment of schizophrenia risk genes in coexpression networks shifting across age and brain regions. SCIENCE ADVANCES 2023; 9:eade2812. [PMID: 37058565 PMCID: PMC10104472 DOI: 10.1126/sciadv.ade2812] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 03/10/2023] [Indexed: 06/19/2023]
Abstract
Schizophrenia is a neurodevelopmental brain disorder whose genetic risk is associated with shifting clinical phenomena across the life span. We investigated the convergence of putative schizophrenia risk genes in brain coexpression networks in postmortem human prefrontal cortex (DLPFC), hippocampus, caudate nucleus, and dentate gyrus granule cells, parsed by specific age periods (total N = 833). The results support an early prefrontal involvement in the biology underlying schizophrenia and reveal a dynamic interplay of regions in which age parsing explains more variance in schizophrenia risk compared to lumping all age periods together. Across multiple data sources and publications, we identify 28 genes that are the most consistently found partners in modules enriched for schizophrenia risk genes in DLPFC; twenty-three are previously unidentified associations with schizophrenia. In iPSC-derived neurons, the relationship of these genes with schizophrenia risk genes is maintained. The genetic architecture of schizophrenia is embedded in shifting coexpression patterns across brain regions and time, potentially underwriting its shifting clinical presentation.
Collapse
Affiliation(s)
- Giulio Pergola
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Madhur Parihar
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Leonardo Sportelli
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
| | - Rahul Bharadwaj
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Christopher Borcuk
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
| | - Eugenia Radulescu
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Loredana Bellantuono
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Bari, Italy
| | - Giuseppe Blasi
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
- Azienda Ospedaliero Universitaria Consorziale Policlinico, Bari, Italy
| | - Qiang Chen
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Joel E. Kleinman
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yanhong Wang
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Srinidhi Rao Sripathy
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Brady J. Maher
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Bari, Italy
- Dipartimento Interateneo di Fisica, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Fabiana Rossi
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Thomas M. Hyde
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Alessandro Bertolino
- Group of Psychiatric Neuroscience, Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
- Azienda Ospedaliero Universitaria Consorziale Policlinico, Bari, Italy
| | - Daniel R. Weinberger
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
3
|
Wang Y, Hicks SC, Hansen KD. Addressing the mean-correlation relationship in co-expression analysis. PLoS Comput Biol 2022; 18:e1009954. [PMID: 35353807 PMCID: PMC9009771 DOI: 10.1371/journal.pcbi.1009954] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 04/14/2022] [Accepted: 02/22/2022] [Indexed: 12/13/2022] Open
Abstract
Estimates of correlation between pairs of genes in co-expression analysis are commonly used to construct networks among genes using gene expression data. As previously noted, the distribution of such correlations depends on the observed expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces an unwanted technical bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.
Collapse
Affiliation(s)
- Yi Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
4
|
Cote AC, Young HE, Huckins LM. Comparison of confound adjustment methods in the construction of gene co-expression networks. Genome Biol 2022; 23:44. [PMID: 35115012 PMCID: PMC8812044 DOI: 10.1186/s13059-022-02606-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 01/03/2022] [Indexed: 11/23/2022] Open
Abstract
Adjustment for confounding sources of expression variation is an important preprocessing step in large gene expression studies, but the effect of confound adjustment on co-expression network analysis has not been well-characterized. Here, we demonstrate that the choice of confound adjustment method can have a considerable effect on the architecture of the resulting co-expression network. We compare standard and alternative confound adjustment methods and provide recommendations for their use in the construction of gene co-expression networks from bulk tissue RNA-seq datasets.
Collapse
Affiliation(s)
- Alanna C. Cote
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Hannah E. Young
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Laura M. Huckins
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters Department of Veterans Affairs Medical Center, Bronx, NY 10468 USA
| |
Collapse
|
5
|
Ponsonby AL. Reflection on modern methods: building causal evidence within high-dimensional molecular epidemiological studies of moderate size. Int J Epidemiol 2021; 50:1016-1029. [PMID: 33594409 DOI: 10.1093/ije/dyaa174] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/17/2020] [Indexed: 12/29/2022] Open
Abstract
This commentary provides a practical perspective on epidemiological analysis within a single high-dimensional study of moderate size to consider a causal question. In this setting, non-causal confounding is important. This occurs when a factor is a determinant of outcome and the underlying association between exposure and the factor is non-causal. That is, the association arises due to chance, confounding or other bias rather than reflecting that exposure and the factor are causally related. In particular, the influence of technical processing factors must be accounted for by pre-processing measures to remove artefact or to control for these factors such as batch run. Work steps include the evaluation of alternative non-causal explanations for observed exposure-disease associations and strategies to obtain the highest level of causal inference possible within the study. A systematic approach is required to work through a question set and obtain insights on not only the exposure-disease association but also the multifactorial causal structure of the underlying data where possible. The appropriate inclusion of molecular findings will enhance the quest to better understand multifactorial disease causation in modern observational epidemiological studies.
Collapse
|
6
|
Grimes T, Datta S. SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data. J Stat Softw 2021; 98:10.18637/jss.v098.i12. [PMID: 34321962 PMCID: PMC8315007 DOI: 10.18637/jss.v098.i12] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available on CRAN and on GitHub at https://github.com/tgrimes/SeqNet.
Collapse
Affiliation(s)
- Tyler Grimes
- Univeristy of Florida, Department of Biostatistics
| | | |
Collapse
|
7
|
Gao S, Dai Y, Rehman J. A Bayesian inference transcription factor activity model for the analysis of single-cell transcriptomes. Genome Res 2021; 31:1296-1311. [PMID: 34193535 PMCID: PMC8256867 DOI: 10.1101/gr.265595.120] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 05/26/2021] [Indexed: 01/06/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful experimental approach to study cellular heterogeneity. One of the challenges in scRNA-seq data analysis is integrating different types of biological data to consistently recognize discrete biological functions and regulatory mechanisms of cells, such as transcription factor activities and gene regulatory networks in distinct cell populations. We have developed an approach to infer transcription factor activities from scRNA-seq data that leverages existing biological data on transcription factor binding sites. The Bayesian inference transcription factor activity model (BITFAM) integrates ChIP-seq transcription factor binding information into scRNA-seq data analysis. We show that the inferred transcription factor activities for key cell types identify regulatory transcription factors that are known to mechanistically control cell function and cell fate. The BITFAM approach not only identifies biologically meaningful transcription factor activities, but also provides valuable insights into underlying transcription factor regulatory mechanisms.
Collapse
Affiliation(s)
- Shang Gao
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60612, USA
- Department of Medicine, Division of Cardiology, University of Illinois at Chicago, Chicago, Illinois 60612, USA
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, Illinois 60612, USA
| | - Yang Dai
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60612, USA
| | - Jalees Rehman
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60612, USA
- Department of Medicine, Division of Cardiology, University of Illinois at Chicago, Chicago, Illinois 60612, USA
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, Illinois 60612, USA
- University of Illinois Cancer Center, Chicago, Illinois 60612, USA
| |
Collapse
|
8
|
Marzorati F, Wang C, Pavesi G, Mizzi L, Morandini P. Cleaning the Medicago Microarray Database to Improve Gene Function Analysis. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10061240. [PMID: 34207216 PMCID: PMC8234645 DOI: 10.3390/plants10061240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 04/30/2021] [Accepted: 05/11/2021] [Indexed: 06/13/2023]
Abstract
Transcriptomics studies have been facilitated by the development of microarray and RNA-Seq technologies, with thousands of expression datasets available for many species. However, the quality of data can be highly variable, making the combined analysis of different datasets difficult and unreliable. Most of the microarray data for Medicago truncatula, the barrel medic, have been stored and made publicly accessible on the web database Medicago truncatula Gene Expression atlas (MtGEA). The aim of this work is to ameliorate the quality of the MtGEA database through a general method based on logical and statistical relationships among parameters and conditions. The initial 716 columns available in the dataset were reduced to 607 by evaluating the quality of data through the sum of the expression levels over the entire transcriptome probes and Pearson correlation among hybridizations. The reduced dataset shows great improvements in the consistency of the data, with a reduction in both false positives and false negatives resulting from Pearson correlation and GO enrichment analysis among genes. The approach we used is of general validity and our intent is to extend the analysis to other plant microarray databases.
Collapse
Affiliation(s)
- Francesca Marzorati
- Department of Environmental Science and Policy, University of Milan, Via Celoria 10, 20133 Milano, Italy;
| | - Chu Wang
- Department of Biosciences, University of Milan, Via Celoria 26, 20133 Milano, Italy; (C.W.); (G.P.); (L.M.)
| | - Giulio Pavesi
- Department of Biosciences, University of Milan, Via Celoria 26, 20133 Milano, Italy; (C.W.); (G.P.); (L.M.)
| | - Luca Mizzi
- Department of Biosciences, University of Milan, Via Celoria 26, 20133 Milano, Italy; (C.W.); (G.P.); (L.M.)
| | - Piero Morandini
- Department of Environmental Science and Policy, University of Milan, Via Celoria 10, 20133 Milano, Italy;
| |
Collapse
|
9
|
Bhattacharya A, Hamilton AM, Furberg H, Pietzak E, Purdue MP, Troester MA, Hoadley KA, Love MI. An approach for normalization and quality control for NanoString RNA expression data. Brief Bioinform 2021; 22:bbaa163. [PMID: 32789507 PMCID: PMC8138885 DOI: 10.1093/bib/bbaa163] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 06/29/2020] [Accepted: 06/30/2020] [Indexed: 01/10/2023] Open
Abstract
The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.
Collapse
Affiliation(s)
| | | | | | | | - Mark P Purdue
- Division of Cancer Epidemiology and Genetics, National Cancer Institute
| | | | | | | |
Collapse
|
10
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
11
|
Serra A, Fratello M, Cattelani L, Liampa I, Melagraki G, Kohonen P, Nymark P, Federico A, Kinaret PAS, Jagiello K, Ha MK, Choi JS, Sanabria N, Gulumian M, Puzyn T, Yoon TH, Sarimveis H, Grafström R, Afantitis A, Greco D. Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment. NANOMATERIALS (BASEL, SWITZERLAND) 2020; 10:E708. [PMID: 32276469 PMCID: PMC7221955 DOI: 10.3390/nano10040708] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 03/25/2020] [Accepted: 03/26/2020] [Indexed: 12/30/2022]
Abstract
Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Irene Liampa
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Georgia Melagraki
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Pekka Kohonen
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Karolina Jagiello
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - My Kieu Ha
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Jang-Sik Choi
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Natasha Sanabria
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
| | - Mary Gulumian
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
- Haematology and Molecular Medicine Department, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Tomasz Puzyn
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Tae-Hyun Yoon
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Roland Grafström
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antreas Afantitis
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
12
|
Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci U S A 2019; 116:27151-27158. [PMID: 31822622 DOI: 10.1073/pnas.1911536116] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Several methods were developed to mine gene-gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC's encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.
Collapse
|
13
|
Li J, Wang D, Wang Y. IBI: Identification of Biomarker Genes in Individual Tumor Samples. Front Genet 2019; 10:1236. [PMID: 31850079 PMCID: PMC6902017 DOI: 10.3389/fgene.2019.01236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
Individual patient biomarkers have an important role in personalized treatment. Although various high-throughput sequencing technologies are widely used in biological experiments, these are usually conducted only once or a few times for each patient, which makes it a challenging problem to identify biomarkers in individual patients. At present, there is a lack of effective methods to identify biomarkers in individual sample data. Here, we propose a novel method, IBI, to identify biomarkers in individual tumor samples. Experimental results from several tumor data sets showed that the proposed method could effectively find biomarker genes for individual patients, including common biomarkers related to the mechanisms of the development of cancer, which can be used to predict survival and drug response in patients. In summary, these results demonstrate that the proposed method offers a new perspective for analyzing individual samples.
Collapse
Affiliation(s)
- Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Dong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
14
|
Schubert M, Colomé-Tatché M, Foijer F. Gene networks in cancer are biased by aneuploidies and sample impurities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194444. [PMID: 31654805 DOI: 10.1016/j.bbagrm.2019.194444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/05/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022]
Abstract
Gene regulatory network inference is a standard technique for obtaining structured regulatory information from, for instance, gene expression measurements. Methods performing this task have been extensively evaluated on synthetic, and to a lesser extent real data sets. In contrast to these test evaluations, applications to gene expression data of human cancers are often limited by fewer samples and more potential regulatory links, and are biased by copy number aberrations as well as cell mixtures and sample impurities. Here, we take networks inferred from TCGA cohorts as an example to show that (1) transcription factor annotations are essential to obtain reliable networks, and (2) even for state of the art methods, we expect that between 20 and 80% of edges are caused by copy number changes and cell mixtures rather than transcription factor regulation.
Collapse
Affiliation(s)
- Michael Schubert
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.
| | - Maria Colomé-Tatché
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Floris Foijer
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands.
| |
Collapse
|
15
|
Hamilton AR, Traniello IM, Ray AM, Caldwell AS, Wickline SA, Robinson GE. Division of labor in honey bees is associated with transcriptional regulatory plasticity in the brain. J Exp Biol 2019; 222:jeb200196. [PMID: 31138635 PMCID: PMC6679348 DOI: 10.1242/jeb.200196] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 05/16/2019] [Indexed: 12/19/2022]
Abstract
Studies in evolutionary and developmental biology show that relationships between transcription factors (TFs) and their target genes can be altered to result in novel regulatory relationships that generate phenotypic plasticity. We hypothesized that context-dependent shifts in the nervous system associated with behavior may also be linked to changes in TF-target relationships over physiological time scales. We tested this hypothesis using honey bee (Apis mellifera) division of labor as a model system by performing bioinformatic analyses of previously published brain transcriptomic profiles together with new RNAi and behavioral experiments. The bioinformatic analyses identified five TFs that exhibited strong signatures of regulatory plasticity as a function of division of labor. RNAi targeting of one of these TFs (broad complex) and a related TF that did not exhibit plasticity (fushi tarazu transcription factor 1) was administered in conjunction with automated analyses of foraging behavior in the field, laboratory assays of aggression and brood care behavior, and endocrine treatments. The results showed that changes in the regulatory relationships of these TFs were associated with behavioral state, social context and endocrine state. These findings provide the first empirical evidence that TF-target relationships in the brain are altered in conjunction with behavior and social context. They also suggest that one mechanism for this plasticity involves pleiotropic TFs high up in regulatory hierarchies producing behavior-specific transcriptional responses by activating different downstream TFs to induce discrete context-dependent transcriptional cascades. These findings provide new insights into the dynamic nature of the transcriptional regulatory architecture underlying behavior in the brain.
Collapse
Affiliation(s)
- Adam R Hamilton
- Neuroscience Program, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
| | - Ian M Traniello
- Neuroscience Program, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
| | - Allyson M Ray
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
| | - Arminius S Caldwell
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
| | - Samuel A Wickline
- Department of Computation and Molecular Biophysics, School of Medicine, Washington University, St. Louis, MO 63110, USA
| | - Gene E Robinson
- Neuroscience Program, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
- Department of Entomology, University of Illinois at Champaign-Urbana, Urbana, IL 61801, USA
| |
Collapse
|
16
|
Pergola G, Di Carlo P, Jaffe AE, Papalino M, Chen Q, Hyde TM, Kleinman JE, Shin JH, Rampino A, Blasi G, Weinberger DR, Bertolino A. Prefrontal Coexpression of Schizophrenia Risk Genes Is Associated With Treatment Response in Patients. Biol Psychiatry 2019; 86:45-55. [PMID: 31126695 DOI: 10.1016/j.biopsych.2019.03.981] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 03/13/2019] [Accepted: 03/14/2019] [Indexed: 10/27/2022]
Abstract
BACKGROUND Gene coexpression networks are relevant to functional and clinical translation of schizophrenia risk genes. We hypothesized that schizophrenia risk genes converge into coexpression pathways that may be associated with gene regulation mechanisms and with response to treatment in patients with schizophrenia. METHODS We identified gene coexpression networks in two prefrontal cortex postmortem RNA sequencing datasets (n = 688) and replicated them in four more datasets (n = 1295). We identified and replicated (p values < .001) a single module enriched for schizophrenia risk loci (13 risk genes in 10 loci). In silico screening of potential regulators of the schizophrenia risk module via bioinformatic analyses identified two transcription factors and three microRNAs associated with the risk module. To translate postmortem information into clinical phenotypes, we identified polymorphisms predicting coexpression and combined them to obtain an index approximating module coexpression (Polygenic Coexpression Index [PCI]). RESULTS The PCI-coexpression association was successfully replicated in two independent brain transcriptome datasets (n = 131; p values < .05). Finally, we tested the association between the PCI and short-term treatment response in two independent samples of patients with schizophrenia treated with olanzapine (n = 167). The PCI was associated with treatment response in the positive symptom domain in both clinical cohorts (p values < .05). CONCLUSIONS In summary, our findings in 1983 samples of human postmortem prefrontal cortex show that coexpression of a set of genes enriched for schizophrenia risk genes is relevant to treatment response. This coexpression pathway may be coregulated by transcription factors and microRNA associated with it.
Collapse
Affiliation(s)
- Giulio Pergola
- Group of Psychiatric Neuroscience, Department of Basic Medical Sciences, Neuroscience and Sense Organs, University of Bari Aldo Moro, Bari, Italy; Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland.
| | - Pasquale Di Carlo
- Group of Psychiatric Neuroscience, Department of Basic Medical Sciences, Neuroscience and Sense Organs, University of Bari Aldo Moro, Bari, Italy; Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland; Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland
| | - Marco Papalino
- Group of Psychiatric Neuroscience, Department of Basic Medical Sciences, Neuroscience and Sense Organs, University of Bari Aldo Moro, Bari, Italy
| | - Qiang Chen
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland; Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland; Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland
| | - Antonio Rampino
- Group of Psychiatric Neuroscience, Department of Basic Medical Sciences, Neuroscience and Sense Organs, University of Bari Aldo Moro, Bari, Italy; Azienda Ospedaliero-Universitaria Consorziale Policlinico, Bari, Italy
| | - Giuseppe Blasi
- Group of Psychiatric Neuroscience, Department of Basic Medical Sciences, Neuroscience and Sense Organs, University of Bari Aldo Moro, Bari, Italy; Azienda Ospedaliero-Universitaria Consorziale Policlinico, Bari, Italy
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, Maryland; Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Alessandro Bertolino
- Group of Psychiatric Neuroscience, Department of Basic Medical Sciences, Neuroscience and Sense Organs, University of Bari Aldo Moro, Bari, Italy; Azienda Ospedaliero-Universitaria Consorziale Policlinico, Bari, Italy.
| |
Collapse
|
17
|
Parsana P, Ruberman C, Jaffe AE, Schatz MC, Battle A, Leek JT. Addressing confounding artifacts in reconstruction of gene co-expression networks. Genome Biol 2019; 20:94. [PMID: 31097038 PMCID: PMC6521369 DOI: 10.1186/s13059-019-1700-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 04/24/2019] [Indexed: 11/12/2022] Open
Abstract
Gene co-expression networks capture biological relationships between genes and are important tools in predicting gene function and understanding disease mechanisms. We show that technical and biological artifacts in gene expression data confound commonly used network reconstruction algorithms. We demonstrate theoretically, in simulation, and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false discoveries. Using data from the GTEx project in multiple tissues, we show that this approach reduces false discoveries beyond correcting only for known confounders.
Collapse
Affiliation(s)
- Princy Parsana
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Claire Ruberman
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
18
|
Boukas L, Havrilla JM, Hickey PF, Quinlan AR, Bjornsson HT, Hansen KD. Coexpression patterns define epigenetic regulators associated with neurological dysfunction. Genome Res 2019; 29:532-542. [PMID: 30858344 PMCID: PMC6442390 DOI: 10.1101/gr.239442.118] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 02/07/2019] [Indexed: 01/12/2023]
Abstract
Coding variants in epigenetic regulators are emerging as causes of neurological dysfunction and cancer. However, a comprehensive effort to identify disease candidates within the human epigenetic machinery (EM) has not been performed; it is unclear whether features exist that distinguish between variation-intolerant and variation-tolerant EM genes, and between EM genes associated with neurological dysfunction versus cancer. Here, we rigorously define 295 genes with a direct role in epigenetic regulation (writers, erasers, remodelers, readers). Systematic exploration of these genes reveals that although individual enzymatic functions are always mutually exclusive, readers often also exhibit enzymatic activity (dual-function EM genes). We find that the majority of EM genes are very intolerant to loss-of-function variation, even when compared to the dosage sensitive transcription factors, and we identify 102 novel EM disease candidates. We show that this variation intolerance is driven by the protein domains encoding the epigenetic function, suggesting that disease is caused by a perturbed chromatin state. We then describe a large subset of EM genes that are coexpressed within multiple tissues. This subset is almost exclusively populated by extremely variation-intolerant genes and shows enrichment for dual-function EM genes. It is also highly enriched for genes associated with neurological dysfunction, even when accounting for dosage sensitivity, but not for cancer-associated EM genes. Finally, we show that regulatory regions near epigenetic regulators are genetically important for common neurological traits. These findings prioritize novel disease candidate EM genes and suggest that this coexpression plays a functional role in normal neurological homeostasis.
Collapse
Affiliation(s)
- Leandros Boukas
- Human Genetics Training Program, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - James M Havrilla
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | - Peter F Hickey
- Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84108, USA
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah 84108, USA
| | - Hans T Bjornsson
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, USA
- Faculty of Medicine, University of Iceland, 101 Reykjavík, Iceland
- Landspitali University Hospital, 101 Reykjavík, Iceland
| | - Kasper D Hansen
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| |
Collapse
|
19
|
Khoo SK, Read J, Franks K, Zhang G, Bizzintino J, Coleman L, McCrae C, Öberg L, Troy NM, Prastanti F, Everard J, Oo S, Borland ML, Maciewicz RA, Le Souëf PN, Laing IA, Bosco A. Upper Airway Cell Transcriptomics Identify a Major New Immunological Phenotype with Strong Clinical Correlates in Young Children with Acute Wheezing. THE JOURNAL OF IMMUNOLOGY 2019; 202:1845-1858. [PMID: 30745463 DOI: 10.4049/jimmunol.1800178] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 01/08/2019] [Indexed: 01/10/2023]
Abstract
Asthma exacerbations are triggered by rhinovirus infections. We employed a systems biology approach to delineate upper-airway gene network patterns underlying asthma exacerbation phenotypes in children. Cluster analysis unveiled distinct IRF7hi versus IRF7lo molecular phenotypes, the former exhibiting robust upregulation of Th1/type I IFN responses and the latter an alternative signature marked by upregulation of cytokine and growth factor signaling and downregulation of IFN-γ. The two phenotypes also produced distinct clinical phenotypes. For IRF7lo children, symptom duration prior to hospital presentation was more than twice as long from initial symptoms (p = 0.011) and nearly three times as long for cough (p < 0.001), the odds ratio of admission to hospital was increased more than 4-fold (p = 0.018), and time to recurrence was shorter (p = 0.015). In summary, our findings demonstrate that asthma exacerbations in children can be divided into IRF7hi versus IRF7lo phenotypes with associated differences in clinical phenotypes.
Collapse
Affiliation(s)
- Siew-Kim Khoo
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - James Read
- Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Kimberley Franks
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Guicheng Zhang
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,School of Public Health, Curtin University, Perth, Western Australia 6102, Australia.,Centre for Genetic Origins of Health and Disease, The University of Western Australia, Perth, Western Australia 6009, Australia and Curtin University, Perth, Western Australia 6102, Australia
| | - Joelene Bizzintino
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Laura Coleman
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Christopher McCrae
- Respiratory Inflammation and Autoimmunity, IMED Biotech Unit, AstraZeneca, Gothenburg, 431 53 Mölndal, Sweden
| | - Lisa Öberg
- Respiratory Inflammation and Autoimmunity, IMED Biotech Unit, AstraZeneca, Gothenburg, 431 53 Mölndal, Sweden
| | - Niamh M Troy
- Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Franciska Prastanti
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Janet Everard
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Stephen Oo
- Division of Paediatrics, School of Medicine, The University of Western Australia, Perth, Western Australia 6009, Australia
| | - Meredith L Borland
- Division of Paediatrics, School of Medicine, The University of Western Australia, Perth, Western Australia 6009, Australia.,Perth Children's Hospital, Perth, Western Australia 6009, Australia; and.,Division of Emergency Medicine, School of Medicine, The University of Western Australia, Perth, Western Australia 6009, Australia
| | - Rose A Maciewicz
- Respiratory Inflammation and Autoimmunity, IMED Biotech Unit, AstraZeneca, Gothenburg, 431 53 Mölndal, Sweden
| | - Peter N Le Souëf
- Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia.,Division of Paediatrics, School of Medicine, The University of Western Australia, Perth, Western Australia 6009, Australia
| | - Ingrid A Laing
- Division of Cardiovascular and Respiratory Sciences, The University of Western Australia, Perth, Western Australia 6009, Australia.,Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia
| | - Anthony Bosco
- Telethon Kids Institute, The University of Western Australia, Perth, Western Australia 6008, Australia;
| |
Collapse
|
20
|
Yue Z, Neylon MT, Nguyen T, Ratliff T, Chen JY. "Super Gene Set" Causal Relationship Discovery from Functional Genomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1991-1998. [PMID: 30040650 PMCID: PMC6380687 DOI: 10.1109/tcbb.2018.2858755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this article, we present a computational framework to identify "causal relationships" among super gene sets. For "causal relationships," we refer to both stimulatory and inhibitory regulatory relationships, regardless of through direct or indirect mechanisms. For super gene sets, we refer to "pathways, annotated lists, and gene signatures," or PAGs. To identify causal relationships among PAGs, we extend the previous work on identifying PAG-to-PAG regulatory relationships by further requiring them to be significantly enriched with gene-to-gene co-expression pairs across the two PAGs involved. This is achieved by developing a quantitative metric based on PAG-to-PAG Co-expressions (PPC), which we use to infer the likelihood that PAG-to-PAG relationships under examination are causal-either stimulatory or inhibitory. Since true causal relationships are unknown, we approximate the overall performance of inferring causal relationships with the performance of recalling known r-type PAG-to-PAG relationships from causal PAG-to-PAG inference, using a functional genomics benchmark dataset from the GEO database. We report the area-under-curve (AUC) performance for both precision and recall being 0.81. By applying our framework to a myeloid-derived suppressor cells (MDSC) dataset, we further demonstrate that this framework is effective in helping build multi-scale biomolecular systems models with new insights on regulatory and causal links for downstream biological interpretations.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| | - Michael T. Neylon
- School of Informatics and Computing, Indiana University, Indianapolis, IN 46202, US.
| | - Thanh Nguyen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| | - Timothy Ratliff
- Purdue University Center for Cancer Research, West Lafayette, IN 47906, US.
| | - Jake Y. Chen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| |
Collapse
|
21
|
Volpato V, Smith J, Sandor C, Ried JS, Baud A, Handel A, Newey SE, Wessely F, Attar M, Whiteley E, Chintawar S, Verheyen A, Barta T, Lako M, Armstrong L, Muschet C, Artati A, Cusulin C, Christensen K, Patsch C, Sharma E, Nicod J, Brownjohn P, Stubbs V, Heywood WE, Gissen P, De Filippis R, Janssen K, Reinhardt P, Adamski J, Royaux I, Peeters PJ, Terstappen GC, Graf M, Livesey FJ, Akerman CJ, Mills K, Bowden R, Nicholson G, Webber C, Cader MZ, Lakics V. Reproducibility of Molecular Phenotypes after Long-Term Differentiation to Human iPSC-Derived Neurons: A Multi-Site Omics Study. Stem Cell Reports 2018; 11:897-911. [PMID: 30245212 PMCID: PMC6178242 DOI: 10.1016/j.stemcr.2018.08.013] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 08/21/2018] [Accepted: 08/21/2018] [Indexed: 12/30/2022] Open
Abstract
Reproducibility in molecular and cellular studies is fundamental to scientific discovery. To establish the reproducibility of a well-defined long-term neuronal differentiation protocol, we repeated the cellular and molecular comparison of the same two iPSC lines across five distinct laboratories. Despite uncovering acceptable variability within individual laboratories, we detect poor cross-site reproducibility of the differential gene expression signature between these two lines. Factor analysis identifies the laboratory as the largest source of variation along with several variation-inflating confounders such as passaging effects and progenitor storage. Single-cell transcriptomics shows substantial cellular heterogeneity underlying inter-laboratory variability and being responsible for biases in differential gene expression inference. Factor analysis-based normalization of the combined dataset can remove the nuisance technical effects, enabling the execution of robust hypothesis-generating studies. Our study shows that multi-center collaborations can expose systematic biases and identify critical factors to be standardized when publishing novel protocols, contributing to increased cross-site reproducibility.
Collapse
Affiliation(s)
- Viola Volpato
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT UK
| | - James Smith
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Cynthia Sandor
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT UK
| | - Janina S Ried
- Neuroscience Discovery, Biology Department, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany
| | - Anna Baud
- Centre for Translational Omics, UCL Great Ormond Street Institute of Child Health, London WC1N 1EH, UK
| | - Adam Handel
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT UK
| | - Sarah E Newey
- Department of Pharmacology, University of Oxford, Mansfield Road, Oxford OX1 3QT, UK
| | - Frank Wessely
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT UK
| | - Moustafa Attar
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Emma Whiteley
- Department of Pharmacology, University of Oxford, Mansfield Road, Oxford OX1 3QT, UK
| | - Satyan Chintawar
- Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford OX3 9DU, UK
| | - An Verheyen
- Janssen Research and Development, Beerse 2340, Belgium
| | - Thomas Barta
- Institute of Genetic Medicine, Newcastle University, Newcastle NE1 3BZ, UK
| | - Majlinda Lako
- Institute of Genetic Medicine, Newcastle University, Newcastle NE1 3BZ, UK
| | - Lyle Armstrong
- Institute of Genetic Medicine, Newcastle University, Newcastle NE1 3BZ, UK
| | - Caroline Muschet
- Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg 85764, Germany
| | - Anna Artati
- Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg 85764, Germany
| | - Carlo Cusulin
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, 4070 Basel, Switzerland
| | - Klaus Christensen
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, 4070 Basel, Switzerland
| | - Christoph Patsch
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, 4070 Basel, Switzerland
| | - Eshita Sharma
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Jerome Nicod
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Philip Brownjohn
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Victoria Stubbs
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Wendy E Heywood
- Centre for Translational Omics, UCL Great Ormond Street Institute of Child Health, London WC1N 1EH, UK
| | - Paul Gissen
- MRC Laboratory for Molecular Cell Biology, University College London, London WC1E 6BT, UK
| | - Roberta De Filippis
- Neuroscience Discovery, Biology Department, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany
| | - Katharina Janssen
- Neuroscience Discovery, Biology Department, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany
| | - Peter Reinhardt
- Neuroscience Discovery, Biology Department, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany
| | - Jerzy Adamski
- Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg 85764, Germany
| | - Ines Royaux
- Janssen Research and Development, Beerse 2340, Belgium
| | | | - Georg C Terstappen
- Neuroscience Discovery, Biology Department, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany
| | - Martin Graf
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, 4070 Basel, Switzerland
| | | | - Colin J Akerman
- Department of Pharmacology, University of Oxford, Mansfield Road, Oxford OX1 3QT, UK
| | - Kevin Mills
- Centre for Translational Omics, UCL Great Ormond Street Institute of Child Health, London WC1N 1EH, UK
| | - Rory Bowden
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - George Nicholson
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Caleb Webber
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT UK.
| | - M Zameel Cader
- Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford OX3 9DU, UK.
| | - Viktor Lakics
- Neuroscience Discovery, Biology Department, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany.
| |
Collapse
|
22
|
Fazio L, Pergola G, Papalino M, Di Carlo P, Monda A, Gelao B, Amoroso N, Tangaro S, Rampino A, Popolizio T, Bertolino A, Blasi G. Transcriptomic context of DRD1 is associated with prefrontal activity and behavior during working memory. Proc Natl Acad Sci U S A 2018; 115:5582-5587. [PMID: 29735686 PMCID: PMC6003490 DOI: 10.1073/pnas.1717135115] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Dopamine D1 receptor (D1R) signaling shapes prefrontal cortex (PFC) activity during working memory (WM). Previous reports found higher WM performance associated with alleles linked to greater expression of the gene coding for D1Rs (DRD1). However, there is no evidence on the relationship between genetic modulation of DRD1 expression in PFC and patterns of prefrontal activity during WM. Furthermore, previous studies have not considered that D1Rs are part of a coregulated molecular environment, which may contribute to D1R-related prefrontal WM processing. Thus, we hypothesized a reciprocal link between a coregulated (i.e., coexpressed) molecular network including DRD1 and PFC activity. To explore this relationship, we used three independent postmortem prefrontal mRNA datasets (total n = 404) to characterize a coexpression network including DRD1 Then, we indexed network coexpression using a measure (polygenic coexpression index-DRD1-PCI) combining the effect of single nucleotide polymorphisms (SNPs) on coexpression. Finally, we associated the DRD1-PCI with WM performance and related brain activity in independent samples of healthy participants (total n = 371). We identified and replicated a coexpression network including DRD1, whose coexpression was correlated with DRD1-PCI. We also found that DRD1-PCI was associated with lower PFC activity and higher WM performance. Behavioral and imaging results were replicated in independent samples. These findings suggest that genetically predicted expression of DRD1 and of its coexpression partners stratifies healthy individuals in terms of WM performance and related prefrontal activity. They also highlight genes and SNPs potentially relevant to pharmacological trials aimed to test cognitive enhancers modulating DRD1 signaling.
Collapse
Affiliation(s)
- Leonardo Fazio
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
- Sezione di Neuroradiologia, Istituto di Ricovero e Cura a Carattere Scientifico "Casa Sollievo della Sofferenza," 71013 San Giovanni Rotondo, Italy
- Contributed Equally
| | - Giulio Pergola
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
- Contributed Equally
| | - Marco Papalino
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
| | - Pasquale Di Carlo
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
| | - Anna Monda
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
| | - Barbara Gelao
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
| | - Nicola Amoroso
- Dipartimento Interateneo di Fisica "M. Merlin," Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
- Sezione di Bari, Istituto Nazionale di Fisica Nucleare, 70125 Bari, Italy
| | - Sabina Tangaro
- Sezione di Bari, Istituto Nazionale di Fisica Nucleare, 70125 Bari, Italy
| | - Antonio Rampino
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
- Institute of Psychiatry, Bari University Hospital, 70124 Bari, Italy
| | - Teresa Popolizio
- Sezione di Neuroradiologia, Istituto di Ricovero e Cura a Carattere Scientifico "Casa Sollievo della Sofferenza," 71013 San Giovanni Rotondo, Italy
| | - Alessandro Bertolino
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy
- Institute of Psychiatry, Bari University Hospital, 70124 Bari, Italy
| | - Giuseppe Blasi
- Institute of Psychiatry, Bari University Hospital, 70124 Bari, Italy
- Department of Basic Medical Science, Neuroscience, and Sense Organs, University of Bari Aldo Moro, 70124 Bari, Italy;
| |
Collapse
|
23
|
A complex network approach reveals a pivotal substructure of genes linked to schizophrenia. PLoS One 2018; 13:e0190110. [PMID: 29304112 PMCID: PMC5755767 DOI: 10.1371/journal.pone.0190110] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 12/10/2017] [Indexed: 12/22/2022] Open
Abstract
Research on brain disorders with a strong genetic component and complex heritability, such as schizophrenia, has led to the development of brain transcriptomics. This field seeks to gain a deeper understanding of gene expression, a key factor in exploring further research issues. Our study focused on how genes are associated amongst each other. In this perspective, we have developed a novel data-driven strategy for characterizing genetic modules, i.e., clusters of strongly interacting genes. The aim was to uncover a pivotal community of genes linked to a target gene for schizophrenia. Our approach combined network topological properties with information theory to highlight the presence of a pivotal community, for a specific gene, and to simultaneously assess the information content of partitions with the Shannon’s entropy based on betweenness. We analyzed the publicly available BrainCloud dataset containing post-mortem gene expression data and focused on the Dopamine D2 receptor, encoded by the DRD2 gene. We used four different community detection algorithms to evaluate the consistence of our approach. A pivotal DRD2 community emerged for all the procedures applied, with a considerable reduction in size, compared to the initial network. The stability of the results was confirmed by a Dice index ≥80% within a range of tested parameters. The detected community was also the most informative, as it represented an optimization of the Shannon entropy. Lastly, we verified the strength of connection of the DRD2 community, which was stronger than any other randomly selected community and even more so than the Weighted Gene Co-expression Network Analysis module, commonly considered the standard approach for such studies. This finding substantiates the conclusion that the detected community represents a more connected and informative cluster of genes for the DRD2 community, and therefore better elucidates the behavior of this module of strongly related DRD2 genes. Because this gene plays a relevant role in Schizophrenia, this finding of a more specific DRD2 community will improve the understanding of the genetic factors related with this disorder.
Collapse
|
24
|
Lönnstedt IM, Nelander S. FC1000: normalized gene expression changes of systematically perturbed human cells. Stat Appl Genet Mol Biol 2017; 16:217-242. [PMID: 28862994 DOI: 10.1515/sagmb-2016-0072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics - termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).
Collapse
|
25
|
Myers CT, Stong N, Mountier EI, Helbig KL, Freytag S, Sullivan JE, Ben Zeev B, Nissenkorn A, Tzadok M, Heimer G, Shinde DN, Rezazadeh A, Regan BM, Oliver KL, Ernst ME, Lippa NC, Mulhern MS, Ren Z, Poduri A, Andrade DM, Bird LM, Bahlo M, Berkovic SF, Lowenstein DH, Scheffer IE, Sadleir LG, Goldstein DB, Mefford HC, Heinzen EL. De Novo Mutations in PPP3CA Cause Severe Neurodevelopmental Disease with Seizures. Am J Hum Genet 2017; 101:516-524. [PMID: 28942967 DOI: 10.1016/j.ajhg.2017.08.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 08/10/2017] [Indexed: 12/30/2022] Open
Abstract
Exome sequencing has readily enabled the discovery of the genetic mutations responsible for a wide range of diseases. This success has been particularly remarkable in the severe epilepsies and other neurodevelopmental diseases for which rare, often de novo, mutations play a significant role in disease risk. Despite significant progress, the high genetic heterogeneity of these disorders often requires large sample sizes to identify a critical mass of individuals with disease-causing mutations in a single gene. By pooling genetic findings across multiple studies, we have identified six individuals with severe developmental delay (6/6), refractory seizures (5/6), and similar dysmorphic features (3/6), each harboring a de novo mutation in PPP3CA. PPP3CA encodes the alpha isoform of a subunit of calcineurin. Calcineurin encodes a calcium- and calmodulin-dependent serine/threonine protein phosphatase that plays a role in a wide range of biological processes, including being a key regulator of synaptic vesicle recycling at nerve terminals. Five individuals with de novo PPP3CA mutations were identified among 4,760 trio probands with neurodevelopmental diseases; this is highly unlikely to occur by chance (p = 1.2 × 10-8) given the size and mutability of the gene. Additionally, a sixth individual with a de novo mutation in PPP3CA was connected to this study through GeneMatcher. Based on these findings, we securely implicate PPP3CA in early-onset refractory epilepsy and further support the emerging role for synaptic dysregulation in epilepsy.
Collapse
Affiliation(s)
- Candace T Myers
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
| | - Nicholas Stong
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Emily I Mountier
- Department of Paediatrics and Child Health, University of Otago, Wellington 6242, New Zealand
| | | | - Saskia Freytag
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, The University of Melbourne, Parkville, VIC 3050, Australia
| | - Joseph E Sullivan
- Department of Neurology & Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Bruria Ben Zeev
- Sheba Medical Center, Ramat Gan, Israel, Sackler School of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Andreea Nissenkorn
- Sheba Medical Center, Ramat Gan, Israel, Sackler School of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Michal Tzadok
- Sheba Medical Center, Ramat Gan, Israel, Sackler School of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Gali Heimer
- Sheba Medical Center, Ramat Gan, Israel, Sackler School of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
| | | | - Arezoo Rezazadeh
- Division of Neurology, Epilepsy Genetics Research Program, Toronto Western Hospital, Krembil Neuroscience Centre, University of Toronto, Toronto, ON M5T 2S8, Canada
| | - Brigid M Regan
- Division of Neurology, Epilepsy Genetics Research Program, Toronto Western Hospital, Krembil Neuroscience Centre, University of Toronto, Toronto, ON M5T 2S8, Canada
| | - Karen L Oliver
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Epilepsy Research Centre, Department of Medicine, Austin Health, The University of Melbourne, Heidelberg, VIC 3084, Australia
| | - Michelle E Ernst
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Natalie C Lippa
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Maureen S Mulhern
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Zhong Ren
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Annapurna Poduri
- Epilepsy Genetics Program, Department of Neurology, Boston Children's Hospital and Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
| | - Danielle M Andrade
- Division of Neurology, Epilepsy Genetics Research Program, Toronto Western Hospital, Krembil Neuroscience Centre, University of Toronto, Toronto, ON M5T 2S8, Canada
| | - Lynne M Bird
- Department of Pediatrics, University of California, San Diego, San Diego, CA 92037, USA; Rady Children's Hospital, San Diego, CA 92037, USA
| | - Melanie Bahlo
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia; Department of Medical Biology, The University of Melbourne, Parkville, VIC 3050, Australia
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, Austin Health, The University of Melbourne, Heidelberg, VIC 3084, Australia
| | - Daniel H Lowenstein
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, Austin Health, The University of Melbourne, Heidelberg, VIC 3084, Australia; Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Parkville, VIC 3010, Australia; Department of Paediatrics, Royal Children's Hospital, The University of Melbourne, Parkville, VIC 3050, Australia
| | - Lynette G Sadleir
- Department of Paediatrics and Child Health, University of Otago, Wellington 6242, New Zealand
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Heather C Mefford
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA 98195, USA.
| | - Erin L Heinzen
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA.
| |
Collapse
|
26
|
Ficklin SP, Dunwoodie LJ, Poehlman WL, Watson C, Roche KE, Feltus FA. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study. Sci Rep 2017; 7:8617. [PMID: 28819158 PMCID: PMC5561081 DOI: 10.1038/s41598-017-09094-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 07/21/2017] [Indexed: 01/10/2023] Open
Abstract
A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.
Collapse
Affiliation(s)
- Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA.
| | - Leland J Dunwoodie
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - William L Poehlman
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - Christopher Watson
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, 99164, USA
| | - Kimberly E Roche
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - F Alex Feltus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA.
| |
Collapse
|
27
|
Freytag S, Burgess R, Oliver KL, Bahlo M. brain-coX: investigating and visualising gene co-expression in seven human brain transcriptomic datasets. Genome Med 2017; 9:55. [PMID: 28595657 PMCID: PMC5465565 DOI: 10.1186/s13073-017-0444-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 05/26/2017] [Indexed: 12/17/2022] Open
Abstract
Background The pathogenesis of neurological and mental health disorders often involves multiple genes, complex interactions, as well as brain- and development-specific biological mechanisms. These characteristics make identification of disease genes for such disorders challenging, as conventional prioritisation tools are not specifically tailored to deal with the complexity of the human brain. Thus, we developed a novel web-application—brain-coX—that offers gene prioritisation with accompanying visualisations based on seven gene expression datasets in the post-mortem human brain, the largest such resource ever assembled. Results We tested whether our tool can correctly prioritise known genes from 37 brain-specific KEGG pathways and 17 psychiatric conditions. We achieved average sensitivity of nearly 50%, at the same time reaching a specificity of approximately 75%. We also compared brain-coX’s performance to that of its main competitors, Endeavour and ToppGene, focusing on the ability to discover novel associations. Using a subset of the curated SFARI autism gene collection we show that brain-coX’s prioritisations are most similar to SFARI’s own curated gene classifications. Conclusions brain-coX is the first prioritisation and visualisation web-tool targeted to the human brain and can be freely accessed via http://shiny.bioinf.wehi.edu.au/freytag.s/. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0444-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saskia Freytag
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia. .,Department of Medical Biology, University of Melbourne, 1G Royale Parade, 3052, Parkville, Australia.
| | - Rosemary Burgess
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, 245 Burgundy Street, 3084, Heidelberg, Australia
| | - Karen L Oliver
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia.,Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, 245 Burgundy Street, 3084, Heidelberg, Australia
| | - Melanie Bahlo
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia.,Department of Medical Biology, University of Melbourne, 1G Royale Parade, 3052, Parkville, Australia.,School of Mathematics and Statistics, University of Melbourne, 3010, Parkville, Australia
| |
Collapse
|
28
|
Henden L, Freytag S, Afawi Z, Baldassari S, Berkovic SF, Bisulli F, Canafoglia L, Casari G, Crompton DE, Depienne C, Gecz J, Guerrini R, Helbig I, Hirsch E, Keren B, Klein KM, Labauge P, LeGuern E, Licchetta L, Mei D, Nava C, Pippucci T, Rudolf G, Scheffer IE, Striano P, Tinuper P, Zara F, Corbett M, Bahlo M. Identity by descent fine mapping of familial adult myoclonus epilepsy (FAME) to 2p11.2-2q11.2. Hum Genet 2016; 135:1117-25. [PMID: 27368338 DOI: 10.1007/s00439-016-1700-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 06/21/2016] [Indexed: 02/03/2023]
Abstract
Familial adult myoclonus epilepsy (FAME) is a rare autosomal dominant disorder characterized by adult onset, involuntary muscle jerks, cortical myoclonus and occasional seizures. FAME is genetically heterogeneous with more than 70 families reported worldwide and five potential disease loci. The efforts to identify potential causal variants have been unsuccessful in all but three families. To date, linkage analysis has been the main approach to find and narrow FAME critical regions. We propose an alternative method, pedigree free identity-by-descent (IBD) mapping, that infers regions of the genome between individuals that have been inherited from a common ancestor. IBD mapping provides an alternative to linkage analysis in the presence of allelic and locus heterogeneity by detecting clusters of individuals who share a common allele. Succeeding IBD mapping, gene prioritization based on gene co-expression analysis can be used to identify the most promising candidate genes. We performed an IBD analysis using high-density single nucleotide polymorphism (SNP) array data followed by gene prioritization on a FAME cohort of ten European families and one Australian/New Zealander family; eight of which had known disease loci. By identifying IBD regions common to multiple families, we were able to narrow the FAME2 locus to a 9.78 megabase interval within 2p11.2-q11.2. We provide additional evidence of a founder effect in four Italian families and allelic heterogeneity with at least four distinct founders responsible for FAME at the FAME2 locus. In addition, we suggest candidate disease genes using gene prioritization based on gene co-expression analysis.
Collapse
Affiliation(s)
- Lyndal Henden
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Saskia Freytag
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Zaid Afawi
- Tel Aviv University Medical School, 69978, Tel Aviv, Israel
| | - Sara Baldassari
- Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi-Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, University of Melbourne Austin Health, Melbourne, VIC, 3084, Australia
| | - Francesca Bisulli
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Laura Canafoglia
- Neurophysiopathology and Epilepsy Center, IRCCS Foundation C. Besta Neurological Institute, Milan, Italy
| | - Giorgio Casari
- Division of Genetics and Cell Biology, Università Vita-Salute San Raffaele, San Raffaele Scientific Institute, Milan, Italy
| | | | - Christel Depienne
- Département de Médicine translationnelle et Neurogénétique, IGBMC, CNRS UMR 7104/INSERM U964/Université de Strasbourg, Illkirch, France.,Laboratoire de diagnostic génétique, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Jozef Gecz
- Robinson Institute and School of Medicine, The University of Adelaide, Adelaide, SA, 5005, Australia.,School of Biological Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Renzo Guerrini
- Pediatric Neurology, Neurogenetics and Neurobiology Unit and Laboratories, Neuroscience Department, A Meyer Children's Hospital, University of Florence, Florence, Italy.,IRCCS Stella Maris Foundation, Pisa, Italy
| | - Ingo Helbig
- Department of Neuropediatrics, Christian-Albrechts-University of Kiel and University Medical Center, Kiel, Schleswig-Holstein, Germany.,Departments of Brain and Cognitive Sciences, Physiology and Cell Biology, Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Negev, Israel.,Division of Neurology, The Children's Hospital of Philadelphia, Philadelphia, USA
| | - Edouard Hirsch
- Medical and Surgical Epilepsy Unit, Hautepierre Hospital, University of Strasbourg, Strasbourg, France
| | - Boris Keren
- Département de Génétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, 75013, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France
| | - Karl Martin Klein
- Department of Neurology, Epilepsy Center Frankfurt Rhine-Main, Center of Neurology and Neurosurgery, University Hospital, Goethe-University Frankfurt, Frankfurt, Germany.,Department of Neurology, Epilepsy Center Hessen, University Hospitals Giessen and Marburg, Philipps-University Marburg, Marburg, Germany
| | - Pierre Labauge
- Department of Neurology, Montpellier University, Gui de Chauliac, 34295, Montpellier, Cedex 5, France
| | - Eric LeGuern
- Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France.,INSERM, U 1127; CNRS, UMR 7225; INSERM UMR 975; Institut du Cerveau et de la Moelle Epinière; and Département de Génétique et de Cytogénétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux De Paris (AP-HP), Paris, France.,Université Pierre et Marie Curie (Paris 6) (UPMC), UMRS 975, Paris, France
| | - Laura Licchetta
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Davide Mei
- Pediatric Neurology, Neurogenetics and Neurobiology Unit and Laboratories, Neuroscience Department, A Meyer Children's Hospital, University of Florence, Florence, Italy
| | - Caroline Nava
- Département de Génétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, 75013, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France
| | - Tommaso Pippucci
- Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi-Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Gabrielle Rudolf
- Département de Médicine translationnelle et Neurogénétique, IGBMC, CNRS UMR 7104/INSERM U964/Université de Strasbourg, Illkirch, France.,Department of Neurology, Hautepierre Hospital, University of Strasbourg, Strasbourg, France
| | - Ingrid Eileen Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne Austin Health, Melbourne, VIC, 3084, Australia.,Florey Institute of Neuroscience and Mental Health, Melbourne, VIC, 3084, Australia.,Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Melbourne, VIC, 3052, Australia
| | - Pasquale Striano
- Pediatric Neurology and Muscular Diseases Unit, Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, Gaslini Institute, Genoa, Italy
| | - Paolo Tinuper
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Federico Zara
- Laboratory of Neurogenetics, Department of Neurosciences, Gaslini Institute, Genoa, Italy
| | - Mark Corbett
- Robinson Institute and School of Medicine, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia. .,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|
29
|
Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J. Exploiting single-cell expression to characterize co-expression replicability. Genome Biol 2016; 17:101. [PMID: 27165153 PMCID: PMC4862082 DOI: 10.1186/s13059-016-0964-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 04/25/2016] [Indexed: 01/25/2023] Open
Abstract
Background Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks. Results We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data. Conclusions Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0964-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Megan Crow
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Anirban Paul
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Z Josh Huang
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA.
| |
Collapse
|
30
|
Ballouz S, Gillis J. AuPairWise: A Method to Estimate RNA-Seq Replicability through Co-expression. PLoS Comput Biol 2016; 12:e1004868. [PMID: 27082953 PMCID: PMC4833304 DOI: 10.1371/journal.pcbi.1004868] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2015] [Accepted: 03/14/2016] [Indexed: 11/23/2022] Open
Abstract
In addition to detecting novel transcripts and higher dynamic range, a principal claim for RNA-sequencing has been greater replicability, typically measured in sample-sample correlations of gene expression levels. Through a re-analysis of ENCODE data, we show that replicability of transcript abundances will provide misleading estimates of the replicability of conditional variation in transcript abundances (i.e., most expression experiments). Heuristics which implicitly address this problem have emerged in quality control measures to obtain ‘good’ differential expression results. However, these methods involve strict filters such as discarding low expressing genes or using technical replicates to remove discordant transcripts, and are costly or simply ad hoc. As an alternative, we model gene-level replicability of differential activity using co-expressing genes. We find that sets of housekeeping interactions provide a sensitive means of estimating the replicability of expression changes, where the co-expressing pair can be regarded as pseudo-replicates of one another. We model the effects of noise that perturbs a gene’s expression within its usual distribution of values and show that perturbing expression by only 5% within that range is readily detectable (AUROC~0.73). We have made our method available as a set of easily implemented R scripts. RNA-sequencing has become a popular means to detect the expression levels of genes. However, quality control is still challenging, requiring both extreme measures and rules which are set in stone from extensive previous analysis. Instead of relying on these rules, we show that co-expression can be used to measure biological replicability with extremely high precision. Co-expression is a well-studied phenomenon in which two genes that are known to form a functional unit are also expressed at similar levels, and change in similar ways across conditions. Using this concept, we can detect how well an experiment replicates by measuring how well it has retained the co-expression pattern across defined gene-pairs. We do this by measuring how easy it is to detect a sample to which some noise has been added. We show this method is a useful tool for quality control.
Collapse
Affiliation(s)
- Sara Ballouz
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| |
Collapse
|
31
|
Oliver KL, Lukic V, Freytag S, Scheffer IE, Berkovic SF, Bahlo M. In silico prioritization based on coexpression can aid epileptic encephalopathy gene discovery. NEUROLOGY-GENETICS 2016; 2:e51. [PMID: 27066588 PMCID: PMC4817907 DOI: 10.1212/nxg.0000000000000051] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 12/10/2015] [Indexed: 02/04/2023]
Abstract
Objective: To evaluate the performance of an in silico prioritization approach that was applied to 179 epileptic encephalopathy candidate genes in 2013 and to expand the application of this approach to the whole genome based on expression data from the Allen Human Brain Atlas. Methods: PubMed searches determined which of the 179 epileptic encephalopathy candidate genes had been validated. For validated genes, it was noted whether they were 1 of the 19 of 179 candidates prioritized in 2013. The in silico prioritization approach was applied genome-wide; all genes were ranked according to their coexpression strength with a reference set (i.e., 51 established epileptic encephalopathy genes) in both adult and developing human brain expression data sets. Candidate genes ranked in the top 10% for both data sets were cross-referenced with genes previously implicated in the epileptic encephalopathies due to a de novo variant. Results: Five of 6 validated epileptic encephalopathy candidate genes were among the 19 prioritized in 2013 (odds ratio = 54, 95% confidence interval [7,∞], p = 4.5 × 10−5, Fisher exact test); one gene was false negative. A total of 297 genes ranked in the top 10% for both the adult and developing brain data sets based on coexpression with the reference set. Of these, 9 had been previously implicated in the epileptic encephalopathies (FBXO41, PLXNA1, ACOT4, PAK6, GABBR2, YWHAG, NBEA, KNDC1, and SELRC1). Conclusions: We conclude that brain gene coexpression data can be used to assist epileptic encephalopathy gene discovery and propose 9 genes as strong epileptic encephalopathy candidates worthy of further investigation.
Collapse
Affiliation(s)
- Karen L Oliver
- Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia
| | - Vesna Lukic
- Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia
| | - Saskia Freytag
- Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia
| | - Ingrid E Scheffer
- Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia
| | - Samuel F Berkovic
- Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia
| | - Melanie Bahlo
- Epilepsy Research Centre (K.L.O., I.E.S., S.F.B.), Department of Medicine, Austin Health, University of Melbourne, Heidelberg, Australia; Population Health and Immunity Division (V.L., S.F., M.B.), The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Florey Institute (I.E.S.), Melbourne, Australia; Department of Paediatrics (I.E.S.), University of Melbourne, Royal Children's Hospital, Melbourne, Australia; and Department of Mathematics and Statistics (M.B.) and Department of Medical Biology (M.B.), University of Melbourne, Australia
| |
Collapse
|
32
|
Jaffe AE. Postmortem human brain genomics in neuropsychiatric disorders--how far can we go? Curr Opin Neurobiol 2015; 36:107-11. [PMID: 26685806 DOI: 10.1016/j.conb.2015.11.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Revised: 11/17/2015] [Accepted: 11/20/2015] [Indexed: 12/24/2022]
Abstract
Large-scale collection of postmortem human brain tissue and subsequent genomic data generation has become a useful approach for better identifying etiological factors contributing to neuropsychiatric disorders. In particular, studying genetic risk variants in non-psychiatric controls can identify biological mechanisms of risk free from confounding factors related to epiphenomena of illness. While the field has begun moving towards cell type-specific analyses, homogenate brain tissue with accompanying cellular profiles, can still identify useful hypotheses for more focused experiments, particularly when the dysregulated cell types are unknown. Technological advances, larger sample sizes, and focused research questions can continue to further leverage postmortem human brain research to better identify and understand the molecular etiology of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Andrew E Jaffe
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, United States; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States.
| |
Collapse
|