1
|
Park JE, Smith MA, Van Alsten SC, Walens A, Wu D, Hoadley KA, Troester MA, Love MI. Diffsig: Associating Risk Factors with Mutational Signatures. Cancer Epidemiol Biomarkers Prev 2024; 33:721-730. [PMID: 38426904 PMCID: PMC11062813 DOI: 10.1158/1055-9965.epi-23-0728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/12/2023] [Accepted: 02/28/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Somatic mutational signatures elucidate molecular vulnerabilities to therapy, and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors. METHODS Here, we develop a statistical model, Diffsig, for estimating the association of one or more continuous or categorical risk factors with DNA mutational signatures. Diffsig takes into account the uncertainty associated with assigning signatures to samples as well as multiple risk factors' simultaneous effect on observed DNA mutations. RESULTS We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. In simulation, our model was capable of accurately estimating expected associations in a variety of contexts. CONCLUSIONS Diffsig allows researchers to quantify and perform inference on the associations of risk factors with mutational signatures. IMPACT We expect Diffsig to provide more robust associations of risk factors with signatures to lead to better understanding of the tumor development process and improved models of tumorigenesis.
Collapse
Affiliation(s)
- Ji-Eun Park
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Markia A. Smith
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Sarah C. Van Alsten
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Andrea Walens
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Di Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Katherine A. Hoadley
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Melissa A. Troester
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
| | - Michael I. Love
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
2
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Bhat V, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet 2024; 56:925-937. [PMID: 38658794 DOI: 10.1038/s41588-024-01726-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/21/2024] [Indexed: 04/26/2024]
Abstract
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Martin Jankowiak
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Vineel Bhat
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, Quebec, Canada
- Faculté de Médecine, Université de Montréal, Montréal, Quebec, Canada
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Luca Pinello
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Pathology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
3
|
Mu W, Luo T, Barrera A, Bounds LR, Klann TS, Ter Weele M, Bryois J, Crawford GE, Sullivan PF, Gersbach CA, Love MI, Li Y. Machine learning methods for predicting guide RNA effects in CRISPR epigenome editing experiments. bioRxiv 2024:2024.04.18.590188. [PMID: 38659894 PMCID: PMC11042384 DOI: 10.1101/2024.04.18.590188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
CRISPR epigenomic editing technologies enable functional interrogation of non-coding elements. However, current computational methods for guide RNA (gRNA) design do not effectively predict the power potential, molecular and cellular impact to optimize for efficient gRNAs, which are crucial for successful applications of these technologies. We present "launch-dCas9" (machine LeArning based UNified CompreHensive framework for CRISPR-dCas9) to predict gRNA impact from multiple perspectives, including cell fitness, wildtype abundance (gauging power potential), and gene expression in single cells. Our launchdCas9, built and evaluated using experiments involving >1 million gRNAs targeted across the human genome, demonstrates relatively high prediction accuracy (AUC up to 0.81) and generalizes across cell lines. Method-prioritized top gRNA(s) are 4.6-fold more likely to exert effects, compared to other gRNAs in the same cis-regulatory region. Furthermore, launchdCas9 identifies the most critical sequence-related features and functional annotations from >40 features considered. Our results establish launch-dCas9 as a promising approach to design gRNAs for CRISPR epigenomic experiments.
Collapse
Affiliation(s)
- Wancen Mu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tianyou Luo
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Lexi R Bounds
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tyler S Klann
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Maria Ter Weele
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Julien Bryois
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Gregory E Crawford
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Pediatrics, Division of Medical Genetics, Duke University Medical Center, Durham, NC, USA
| | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles A Gersbach
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
4
|
Singh NP, Wu EY, Fan J, Love MI, Patro R. Tree-based differential testing using inferential uncertainty for RNA-Seq. bioRxiv 2023:2023.12.25.573288. [PMID: 38234739 PMCID: PMC10793400 DOI: 10.1101/2023.12.25.573288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Identifying differentially expressed transcripts poses a crucial yet challenging problem in transcriptomics. Substantial uncertainty is associated with the abundance estimates of certain transcripts which, if ignored, can lead to the exaggeration of false positives and, if included, may lead to reduced power. For a given set of RNA-Seq samples, TreeTerminus arranges transcripts in a hierarchical tree structure that encodes different layers of resolution for interpretation of the abundance of transcriptional groups, with uncertainty generally decreasing as one ascends the tree from the leaves. We introduce trenDi, which utilizes the tree structure from TreeTerminus for differential testing. The candidate nodes are determined in a data-driven manner to maximize the signal that can be extracted from the data while controlling for the uncertainty associated with estimating the transcript abundances. The identified candidate nodes can include transcripts and inner nodes, with no two nodes having an ancestor/descendant relationship. We evaluated our method on both simulated and experimental datasets, comparing its performance with other tree-based differential methods as well as with uncertainty-aware differential transcript/gene expression methods. Our method detects inner nodes that show a strong signal for differential expression, which would have been overlooked when analyzing the transcripts alone.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park
| | - Euphy Y. Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill
| | - Jason Fan
- Department of Computer Science, University of Maryland, College Park
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina-Chapel Hill
- Department of Genetics, University of North Carolina-Chapel Hill
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park
| |
Collapse
|
5
|
Brotman SM, Oravilahti A, Rosen JD, Alvarez M, Heinonen S, van der Kolk BW, Fernandes Silva L, Perrin HJ, Vadlamudi S, Pylant C, Deochand S, Basta PV, Valone JM, Narain MN, Stringham HM, Boehnke M, Kuusisto J, Love MI, Pietiläinen KH, Pajukanta P, Laakso M, Mohlke KL. Cell-Type Composition Affects Adipose Gene Expression Associations With Cardiometabolic Traits. Diabetes 2023; 72:1707-1718. [PMID: 37647564 PMCID: PMC10588284 DOI: 10.2337/db23-0365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023]
Abstract
Understanding differences in adipose gene expression between individuals with different levels of clinical traits may reveal the genes and mechanisms leading to cardiometabolic diseases. However, adipose is a heterogeneous tissue. To account for cell-type heterogeneity, we estimated cell-type proportions in 859 subcutaneous adipose tissue samples with bulk RNA sequencing (RNA-seq) using a reference single-nuclear RNA-seq data set. Cell-type proportions were associated with cardiometabolic traits; for example, higher macrophage and adipocyte proportions were associated with higher and lower BMI, respectively. We evaluated cell-type proportions and BMI as covariates in tests of association between >25,000 gene expression levels and 22 cardiometabolic traits. For >95% of genes, the optimal, or best-fit, models included BMI as a covariate, and for 79% of associations, the optimal models also included cell type. After adjusting for the optimal covariates, we identified 2,664 significant associations (P ≤ 2e-6) for 1,252 genes and 14 traits. Among genes proposed to affect cardiometabolic traits based on colocalized genome-wide association study and adipose expression quantitative trait locus signals, 25 showed a corresponding association between trait and gene expression levels. Overall, these results suggest the importance of modeling cell-type proportion when identifying gene expression associations with cardiometabolic traits. ARTICLE HIGHLIGHTS
Collapse
Affiliation(s)
- Sarah M. Brotman
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| | - Anniina Oravilahti
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Jonathan D. Rosen
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| | - Marcus Alvarez
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, CA
| | - Sini Heinonen
- Obesity Research Unit, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Birgitta W. van der Kolk
- Obesity Research Unit, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Lilian Fernandes Silva
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Hannah J. Perrin
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| | | | - Cortney Pylant
- Department of Epidemiology, The University of North Carolina, Chapel Hill, NC
| | - Sonia Deochand
- Department of Epidemiology, The University of North Carolina, Chapel Hill, NC
| | - Patricia V. Basta
- Department of Epidemiology, The University of North Carolina, Chapel Hill, NC
| | - Jordan M. Valone
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
- UNC Neuroscience Center, The University of North Carolina, Chapel Hill, NC
| | - Morgan N. Narain
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
- Curriculum of Toxicology and Environmental Medicine, The University of North Carolina, Chapel Hill, NC
| | - Heather M. Stringham
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Johanna Kuusisto
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
- Department of Medicine, Kuopio University Hospital, Kuopio, Finland
| | - Michael I. Love
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
- Department of Biostatistics, The University of North Carolina, Chapel Hill, NC
| | - Kirsi H. Pietiläinen
- Obesity Research Unit, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- HealthyWeightHub, Endocrinology, Abdominal Center, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
| | - Päivi Pajukanta
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, CA
- Institute for Precision Health, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
- Department of Medicine, Kuopio University Hospital, Kuopio, Finland
| | - Karen L. Mohlke
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| |
Collapse
|
6
|
Brotman SM, El-Sayed Moustafa JS, Guan L, Broadaway KA, Wang D, Jackson AU, Welch R, Currin KW, Tomlinson M, Vadlamudi S, Stringham HM, Roberts AL, Lakka TA, Oravilahti A, Silva LF, Narisu N, Erdos MR, Yan T, Bonnycastle LL, Raulerson CK, Raza Y, Yan X, Parker SCJ, Kuusisto J, Pajukanta P, Tuomilehto J, Collins FS, Boehnke M, Love MI, Koistinen HA, Laakso M, Mohlke KL, Small KS, Scott LJ. Adipose tissue eQTL meta-analysis reveals the contribution of allelic heterogeneity to gene expression regulation and cardiometabolic traits. bioRxiv 2023:2023.10.26.563798. [PMID: 37961277 PMCID: PMC10634839 DOI: 10.1101/2023.10.26.563798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Complete characterization of the genetic effects on gene expression is needed to elucidate tissue biology and the etiology of complex traits. Here, we analyzed 2,344 subcutaneous adipose tissue samples and identified 34K conditionally distinct expression quantitative trait locus (eQTL) signals in 18K genes. Over half of eQTL genes exhibited at least two eQTL signals. Compared to primary signals, non-primary signals had lower effect sizes, lower minor allele frequencies, and less promoter enrichment; they corresponded to genes with higher heritability and higher tolerance for loss of function. Colocalization of eQTL with conditionally distinct genome-wide association study signals for 28 cardiometabolic traits identified 3,605 eQTL signals for 1,861 genes. Inclusion of non-primary eQTL signals increased colocalized signals by 46%. Among 30 genes with ≥2 pairs of colocalized signals, 21 showed a mediating gene dosage effect on the trait. Thus, expanded eQTL identification reveals more mechanisms underlying complex traits and improves understanding of the complexity of gene expression regulation.
Collapse
Affiliation(s)
- Sarah M Brotman
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | | - Li Guan
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - K Alaine Broadaway
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Dongmeng Wang
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Anne U Jackson
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Ryan Welch
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Kevin W Currin
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Max Tomlinson
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | | | - Heather M Stringham
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Amy L Roberts
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Timo A Lakka
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
- Department of Clinical Physiology and Nuclear Medicine, Kuopio University Hospital, Kuopio, Finland
- Foundation for Research in Health Exercise and Nutrition, Kuopio Research Institute of Exercise Medicine, Kuopio, Finland
| | - Anniina Oravilahti
- Institute of Clinical Medicine, Kuopio University Hospital, University of Eastern Finland, Kuopio, Finland
| | - Lilian Fernandes Silva
- Institute of Clinical Medicine, Kuopio University Hospital, University of Eastern Finland, Kuopio, Finland
| | - Narisu Narisu
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michael R Erdos
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Tingfen Yan
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lori L Bonnycastle
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Yasrab Raza
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Xinyu Yan
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Stephen C J Parker
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Johanna Kuusisto
- Department of Medicine and Clinical Research, Kuopio University Hospital, Kuopio, Finland
| | - Päivi Pajukanta
- Department of Human Genetics and Institute for Precision Health, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Jaakko Tuomilehto
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Diabetes Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Francis S Collins
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Heikki A Koistinen
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- University of Helsinki and Department of Medicine, Helsinki University Hospital, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Markku Laakso
- Institute of Clinical Medicine, Kuopio University Hospital, University of Eastern Finland, Kuopio, Finland
- Department of Medicine and Clinical Research, Kuopio University Hospital, Kuopio, Finland
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Kerrin S Small
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Laura J Scott
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
7
|
McAfee JC, Lee S, Lee J, Bell JL, Krupa O, Davis J, Insigne K, Bond ML, Zhao N, Boyle AP, Phanstiel DH, Love MI, Stein JL, Ruzicka WB, Davila-Velderrain J, Kosuri S, Won H. Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. Cell Genom 2023; 3:100404. [PMID: 37868037 PMCID: PMC10589626 DOI: 10.1016/j.xgen.2023.100404] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 02/23/2023] [Accepted: 08/21/2023] [Indexed: 10/24/2023]
Abstract
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We performed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.
Collapse
Affiliation(s)
- Jessica C. McAfee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Sool Lee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jiseok Lee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Oleh Krupa
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jessica Davis
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Quantitative and Computational Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kimberly Insigne
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Quantitative and Computational Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Marielle L. Bond
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Nanxiang Zhao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alan P. Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Douglas H. Phanstiel
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I. Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jason L. Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - W. Brad Ruzicka
- Laboratory for Epigenomics in Human Psychopathology, McLean Hospital, Belmont, MA 02141, USA
- Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Quantitative and Computational Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
8
|
Cho H, Qu Y, Liu C, Tang B, Lyu R, Lin BM, Roach J, Azcarate-Peril MA, Aguiar Ribeiro A, Love MI, Divaris K, Wu D. Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data. Brief Bioinform 2023; 24:bbad279. [PMID: 37738402 PMCID: PMC10516371 DOI: 10.1093/bib/bbad279] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/23/2023] [Accepted: 07/18/2023] [Indexed: 09/24/2023] Open
Abstract
Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal-Wallis and two-part Kruskal-Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.
Collapse
Affiliation(s)
- Hunyong Cho
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
| | - Yixiang Qu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
| | - Chuwen Liu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
| | - Boyang Tang
- Department of Statistics, University of Connecticut, Storrs, CT, United States
| | - Ruiqi Lyu
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
| | - Bridget M Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
| | - Jeffrey Roach
- Research Computing, University of North Carolina, Chapel Hill, NC, United States
| | - M Andrea Azcarate-Peril
- Department of Medicine and Nutrition, University of North Carolina, Chapel Hill, NC, United States
| | - Apoena Aguiar Ribeiro
- Division of Diagnostic Sciences, University of North Carolina, Chapel Hill, NC, United States
| | - Michael I Love
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina, Chapel Hill, NC, United States
| | - Kimon Divaris
- Division of Pediatric and Public Health, University of North Carolina, Chapel Hill, NC, United States
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, United States
| | - Di Wu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
- Division of Oral and Craniofacial Health Sciences, Adam School of Dentistry, University of North Carolina, Chapel Hill, NC, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, United States
| |
Collapse
|
9
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. medRxiv 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
10
|
Aygün N, Krupa O, Mory J, Le B, Valone J, Liang D, Love MI, Stein JL. Genetics of cell-type-specific post-transcriptional gene regulation during human neurogenesis. bioRxiv 2023:2023.08.30.555019. [PMID: 37693528 PMCID: PMC10491258 DOI: 10.1101/2023.08.30.555019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
The function of some genetic variants associated with brain-relevant traits has been explained through colocalization with expression quantitative trait loci (eQTL) conducted in bulk post-mortem adult brain tissue. However, many brain-trait associated loci have unknown cellular or molecular function. These genetic variants may exert context-specific function on different molecular phenotypes including post-transcriptional changes. Here, we identified genetic regulation of RNA-editing and alternative polyadenylation (APA), within a cell-type-specific population of human neural progenitors and neurons. More RNA-editing and isoforms utilizing longer polyadenylation sequences were observed in neurons, likely due to higher expression of genes encoding the proteins mediating these post-transcriptional events. We also detected hundreds of cell-type-specific editing quantitative trait loci (edQTLs) and alternative polyadenylation QTLs (apaQTLs). We found colocalizations of a neuron edQTL in CCDC88A with educational attainment and a progenitor apaQTL in EP300 with schizophrenia, suggesting genetically mediated post-transcriptional regulation during brain development lead to differences in brain function.
Collapse
Affiliation(s)
- Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Oleh Krupa
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jessica Mory
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Brandon Le
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jordan Valone
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I. Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jason L. Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lead contact
| |
Collapse
|
11
|
Bond ML, Davis ES, Quiroga IY, Dey A, Kiran M, Love MI, Won H, Phanstiel DH. Chromatin loop dynamics during cellular differentiation are associated with changes to both anchor and internal regulatory features. Genome Res 2023; 33:1258-1268. [PMID: 37699658 PMCID: PMC10547260 DOI: 10.1101/gr.277397.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/07/2023] [Indexed: 09/14/2023]
Abstract
Three-dimensional (3D) chromatin structure has been shown to play a role in regulating gene transcription during biological transitions. Although our understanding of loop formation and maintenance is rapidly improving, much less is known about the mechanisms driving changes in looping and the impact of differential looping on gene transcription. One limitation has been a lack of well-powered differential looping data sets. To address this, we conducted a deeply sequenced Hi-C time course of megakaryocyte development comprising four biological replicates and 6 billion reads per time point. Statistical analysis revealed 1503 differential loops. Gained loop anchors were enriched for AP-1 occupancy and were characterized by large increases in histone H3K27ac (over 11-fold) but relatively small increases in CTCF and RAD21 binding (1.26- and 1.23-fold, respectively). Linear modeling revealed that changes in histone H3K27ac, chromatin accessibility, and JUN binding were better correlated with changes in looping than RAD21 and almost as well correlated as CTCF. Changes to epigenetic features between-rather than at-boundaries were highly predictive of changes in looping. Together these data suggest that although CTCF and RAD21 may be the core machinery dictating where loops form, other features (both at the anchors and within the loop boundaries) may play a larger role than previously anticipated in determining the relative loop strength across cell types and conditions.
Collapse
Affiliation(s)
- Marielle L Bond
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Ivana Y Quiroga
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Anubha Dey
- Department of Systems and Computational Biology, University of Hyderabad, Hyderabad 500046, Telangana, India
| | - Manjari Kiran
- Department of Systems and Computational Biology, University of Hyderabad, Hyderabad 500046, Telangana, India
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514, USA
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514, USA;
- Neuroscience Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Douglas H Phanstiel
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA;
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
12
|
Chen Y, Sim A, Wan YK, Yeo K, Lee JJX, Ling MH, Love MI, Göke J. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat Methods 2023; 20:1187-1195. [PMID: 37308696 PMCID: PMC10448944 DOI: 10.1038/s41592-023-01908-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/08/2023] [Indexed: 06/14/2023]
Abstract
Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.
Collapse
Affiliation(s)
- Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Andre Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Keith Yeo
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Joseph Jing Xian Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Min Hao Ling
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Department of Statistics and Data Science, National University of Singapore, Singapore, Republic of Singapore.
| |
Collapse
|
13
|
Glass MR, Waxman EA, Yamashita S, Lafferty M, Beltran A, Farah T, Patel NK, Matoba N, Ahmed S, Srivastava M, Drake E, Davis LT, Yeturi M, Sun K, Love MI, Hashimoto-Torii K, French DL, Stein JL. Cross-site reproducibility of human cortical organoids reveals consistent cell type composition and architecture. bioRxiv 2023:2023.07.28.550873. [PMID: 37546772 PMCID: PMC10402155 DOI: 10.1101/2023.07.28.550873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Background Reproducibility of human cortical organoid (hCO) phenotypes remains a concern for modeling neurodevelopmental disorders. While guided hCO protocols reproducibly generate cortical cell types in multiple cell lines at one site, variability across sites using a harmonized protocol has not yet been evaluated. We present an hCO cross-site reproducibility study examining multiple phenotypes. Methods Three independent research groups generated hCOs from one induced pluripotent stem cell (iPSC) line using a harmonized miniaturized spinning bioreactor protocol. scRNA-seq, 3D fluorescent imaging, phase contrast imaging, qPCR, and flow cytometry were used to characterize the 3 month differentiations across sites. Results In all sites, hCOs were mostly cortical progenitor and neuronal cell types in reproducible proportions with moderate to high fidelity to the in vivo brain that were consistently organized in cortical wall-like buds. Cross-site differences were detected in hCO size and morphology. Differential gene expression showed differences in metabolism and cellular stress across sites. Although iPSC culture conditions were consistent and iPSCs remained undifferentiated, primed stem cell marker expression prior to differentiation correlated with cell type proportions in hCOs. Conclusions We identified hCO phenotypes that are reproducible across sites using a harmonized differentiation protocol. Previously described limitations of hCO models were also reproduced including off-target differentiations, necrotic cores, and cellular stress. Improving our understanding of how stem cell states influence early hCO cell types may increase reliability of hCO differentiations. Cross-site reproducibility of hCO cell type proportions and organization lays the foundation for future collaborative prospective meta-analytic studies modeling neurodevelopmental disorders in hCOs.
Collapse
Affiliation(s)
- Madison R Glass
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Elisa A Waxman
- Center for Cellular and Molecular Therapeutics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Satoshi Yamashita
- Center for Neuroscience Research, Children's National Hospital, Washington, DC
| | - Michael Lafferty
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Alvaro Beltran
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Tala Farah
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Niyanta K Patel
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Nana Matoba
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Sara Ahmed
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Mary Srivastava
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Emma Drake
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Liam T Davis
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Meghana Yeturi
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Kexin Sun
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Departments of Pediatrics, and Pharmacology & Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC
| | - Kazue Hashimoto-Torii
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA
| | - Deborah L French
- Center for Cellular and Molecular Therapeutics, The Children's Hospital of Philadelphia, Philadelphia, PA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA
| | - Jason L Stein
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
14
|
Wu EY, Singh NP, Choi K, Zakeri M, Vincent M, Churchill GA, Ackert-Bicknell CL, Patro R, Love MI. SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty. Genome Biol 2023; 24:165. [PMID: 37438847 PMCID: PMC10337143 DOI: 10.1186/s13059-023-03003-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.
Collapse
Affiliation(s)
- Euphy Y Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noor P Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | | | - Cheryl L Ackert-Bicknell
- Department of Orthopedics, School of Medicine, University of Colorado, Anschutz Campus, Aurora, CO, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
15
|
Singh NP, Love MI, Patro R. TreeTerminus -creating transcript trees using inferential replicate counts. iScience 2023; 26:106961. [PMID: 37378336 PMCID: PMC10291472 DOI: 10.1016/j.isci.2023.106961] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/18/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open
Abstract
A certain degree of uncertainty is always associated with the transcript abundance estimates. The uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. We introduce TreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set. TreeTerminus constructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. We evaluated TreeTerminus on two simulated and two experimental datasets and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| |
Collapse
|
16
|
Aygün N, Liang D, Crouse WL, Keele GR, Love MI, Stein JL. Inferring cell-type-specific causal gene regulatory networks during human neurogenesis. Genome Biol 2023; 24:130. [PMID: 37254169 PMCID: PMC10230710 DOI: 10.1186/s13059-023-02959-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/05/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Genetic variation influences both chromatin accessibility, assessed in chromatin accessibility quantitative trait loci (caQTL) studies, and gene expression, assessed in expression QTL (eQTL) studies. Genetic variants can impact either nearby genes (cis-eQTLs) or distal genes (trans-eQTLs). Colocalization between caQTL and eQTL, or cis- and trans-eQTLs suggests that they share causal variants. However, pairwise colocalization between these molecular QTLs does not guarantee a causal relationship. Mediation analysis can be applied to assess the evidence supporting causality versus independence between molecular QTLs. Given that the function of QTLs can be cell-type-specific, we performed mediation analyses to find epigenetic and distal regulatory causal pathways for genes within two major cell types of the developing human cortex, progenitors and neurons. RESULTS We find that the expression of 168 and 38 genes is mediated by chromatin accessibility in progenitors and neurons, respectively. We also find that the expression of 11 and 12 downstream genes is mediated by upstream genes in progenitors and neurons. Moreover, we discover that a genetic locus associated with inter-individual differences in brain structure shows evidence for mediation of SLC26A7 through chromatin accessibility, identifying molecular mechanisms of a common variant association to a brain trait. CONCLUSIONS In this study, we identify cell-type-specific causal gene regulatory networks whereby the impacts of variants on gene expression were mediated by chromatin accessibility or distal gene expression. Identification of these causal paths will enable identifying and prioritizing actionable regulatory targets perturbing these key processes during neurodevelopment.
Collapse
Affiliation(s)
- Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Wesley L Crouse
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Gregory R Keele
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
17
|
Mu W, Davis ES, Lee S, Dozmorov MG, Phanstiel DH, Love MI. bootRanges: flexible generation of null sets of genomic ranges for hypothesis testing. Bioinformatics 2023; 39:btad190. [PMID: 37042725 PMCID: PMC10159650 DOI: 10.1093/bioinformatics/btad190] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 01/06/2023] [Accepted: 03/28/2023] [Indexed: 04/13/2023] Open
Abstract
MOTIVATION Enrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions. RESULTS bootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. AVAILABILITY AND IMPLEMENTATION bootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.
Collapse
Affiliation(s)
- Wancen Mu
- Department of Biostatistics, University of North Carolina, Chapel Hill 27514, United States
| | - Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27514, United States
| | - Stuart Lee
- Genentech, South San Francisco, Western California 94080, United States
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23284, United States
- Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Douglas H Phanstiel
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27514, United States
- Thurston Arthritis Research Center, University of North Carolina, Chapel Hill 27514, United States
- Department of Cell Biology and Physiology, University of North Carolina, Chapel Hill 27514, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill 27514, United States
- Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill 27514, United States
| | - Michael I Love
- Department of Biostatistics, University of North Carolina, Chapel Hill 27514, United States
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27514, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill 27514, United States
- Department of Genetics, University of North Carolina, Chapel Hill 27514, United States
| |
Collapse
|
18
|
Davis ES, Mu W, Lee S, Dozmorov MG, Love MI, Phanstiel DH. matchRanges: generating null hypothesis genomic ranges via covariate-matched sampling. Bioinformatics 2023; 39:btad197. [PMID: 37084270 PMCID: PMC10168584 DOI: 10.1093/bioinformatics/btad197] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 02/01/2023] [Accepted: 03/28/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Deriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non-trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow the selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows. RESULTS To address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework. AVAILABILITY AND IMPLEMENTATION Package: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.
Collapse
Affiliation(s)
- Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Wancen Mu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Stuart Lee
- Genentech, South San Francisco, CA, United States
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, United States
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, United States
| | - Michael I Love
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Douglas H Phanstiel
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Cell Biology & Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
19
|
Jiang MZ, Aguet F, Ardlie K, Chen J, Cornell E, Cruz D, Durda P, Gabriel SB, Gerszten RE, Guo X, Johnson CW, Kasela S, Lange LA, Lappalainen T, Liu Y, Reiner AP, Smith J, Sofer T, Taylor KD, Tracy RP, VanDenBerg DJ, Wilson JG, Rich SS, Rotter JI, Love MI, Raffield LM, Li Y. Canonical correlation analysis for multi-omics: Application to cross-cohort analysis. PLoS Genet 2023; 19:e1010517. [PMID: 37216410 PMCID: PMC10237647 DOI: 10.1371/journal.pgen.1010517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 06/02/2023] [Accepted: 05/01/2023] [Indexed: 05/24/2023] Open
Abstract
Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.
Collapse
Affiliation(s)
- Min-Zhi Jiang
- Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - François Aguet
- Illumina Artificial Intelligence Laboratory, Illumina, Inc., San Diego, California, United States of America
| | - Kristin Ardlie
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Elaine Cornell
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, Vermont, United States of America
| | - Dan Cruz
- Department of Medicine, Cardiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Peter Durda
- Department of Pathology & Laboratory Medicine, University of Vermont, Colchester, Vermont, United States of America
| | - Stacey B. Gabriel
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Robert E. Gerszten
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America
| | - Craig W. Johnson
- Department of Biostatistics, University of Washington at Seattle, Seattle, Washington, United States of America
| | - Silva Kasela
- New York Genome Center, New York, New York, United States of America
| | - Leslie A. Lange
- Department of Epidemiology, Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, Lifecourse Epidemiology of Adiposity & Diabetes Center, Aurora, Colorado, United States of America
| | - Tuuli Lappalainen
- New York Genome Center, New York, New York, United States of America
| | - Yongmei Liu
- Department of Medicine, Cardiology and Neurology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Alex P. Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Josh Smith
- Northwest Genomic Center, University of Washington, Seattle, Washington, United States of America
| | - Tamar Sofer
- Department of Biostatistics, Harvard Medical School, Medicine-Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Kent D. Taylor
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America
| | - Russell P. Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont, Colchester, Vermont, United States of America
| | - David J. VanDenBerg
- Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - James G. Wilson
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia, United States of America
| | - Jerome I. Rotter
- Department of Pediatrics, Genomic Outcomes, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | | |
Collapse
|
20
|
Matoba N, Le BD, Valone JM, Wolter JM, Mory J, Liang D, Aygün N, Broadaway KA, Bond ML, Mohlke KL, Zylka MJ, Love MI, Stein JL. Wnt activity reveals context-specific genetic effects on gene regulation in neural progenitors. bioRxiv 2023:2023.02.07.527357. [PMID: 36798360 PMCID: PMC9934631 DOI: 10.1101/2023.02.07.527357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Gene regulatory effects in bulk-post mortem brain tissues are undetected at many non-coding brain trait-associated loci. We hypothesized that context-specific genetic variant function during stimulation of a developmental signaling pathway would explain additional regulatory mechanisms. We measured chromatin accessibility and gene expression following activation of the canonical Wnt pathway in primary human neural progenitors from 82 donors. TCF/LEF motifs, brain structure-, and neuropsychiatric disorder-associated variants were enriched within Wnt-responsive regulatory elements (REs). Genetically influenced REs were enriched in genomic regions under positive selection along the human lineage. Stimulation of the Wnt pathway increased the detection of genetically influenced REs/genes by 66.2%/52.7%, and led to the identification of 397 REs primed for effects on gene expression. Context-specific molecular quantitative trait loci increased brain-trait colocalizations by up to 70%, suggesting that genetic variant effects during early neurodevelopmental patterning lead to differences in adult brain and behavioral traits.
Collapse
Affiliation(s)
- Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Brandon D Le
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Jordan M Valone
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Justin M Wolter
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- Carolina Institute for Developmental Disabilities; Carrboro, NC, USA
| | - Jessica Mory
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - K Alaine Broadaway
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Marielle L Bond
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Mark J Zylka
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- Carolina Institute for Developmental Disabilities; Carrboro, NC, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, USA
- Carolina Institute for Developmental Disabilities; Carrboro, NC, USA
| |
Collapse
|
21
|
Ogata JD, Mu W, Davis ES, Xue B, Harrell JC, Sheffield NC, Phanstiel DH, Love MI, Dozmorov MG. excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies. Bioinformatics 2023; 39:7126418. [PMID: 37067481 PMCID: PMC10126321 DOI: 10.1093/bioinformatics/btad198] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 02/16/2023] [Accepted: 04/12/2023] [Indexed: 04/18/2023]
Abstract
SUMMARY Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g., centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION https://bioconductor.org/packages/excluderanges/. SUPPLEMENTARY INFORMATION Package website: https://dozmorovlab.github.io/excluderanges/.
Collapse
Affiliation(s)
- Jonathan D Ogata
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Wancen Mu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, USA
| | - Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bingjie Xue
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Cancer Center, Virginia Commonwealth University, Richmond, VA 23220, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Douglas H Phanstiel
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, USA
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
22
|
McCabe SD, Nobel AB, Love MI. ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel. Biostatistics 2023; 24:388-405. [PMID: 33948626 PMCID: PMC10102900 DOI: 10.1093/biostatistics/kxab013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 11/13/2022] Open
Abstract
The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.
Collapse
Affiliation(s)
- Sean D McCabe
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA
| | - Andrew B Nobel
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 318 Hanes Hall, Chapel Hill, NC 27599-3260, USA and Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA and Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Rd, Chapel Hill, NC 27514, USA
| |
Collapse
|
23
|
Wen C, Margolis M, Dai R, Zhang P, Przytycki PF, Vo DD, Bhattacharya A, Matoba N, Jiao C, Kim M, Tsai E, Hoh C, Aygün N, Walker RL, Chatzinakos C, Clarke D, Pratt H, Consortium P, Peters MA, Gerstein M, Daskalakis NP, Weng Z, Jaffe AE, Kleinman JE, Hyde TM, Weinberger DR, Bray NJ, Sestan N, Geschwind DH, Roeder K, Gusev A, Pasaniuc B, Stein JL, Love MI, Pollard KS, Liu C, Gandal MJ. Cross-ancestry, cell-type-informed atlas of gene, isoform, and splicing regulation in the developing human brain. medRxiv 2023:2023.03.03.23286706. [PMID: 36945630 PMCID: PMC10029021 DOI: 10.1101/2023.03.03.23286706] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Michael Margolis
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
| | - Pan Zhang
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
| | - Daniel D Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Chuan Jiao
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Ellen Tsai
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Celine Hoh
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Rebecca L Walker
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Christos Chatzinakos
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
| | - PsychENCODE Consortium
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
- CNS Data Coordination Group, Sage Bionetworks; Seattle, WA, 98109, USA
- Program in Computational Biology and Bioinformatics, Yale University; New Haven, CT, 06520, USA
- Department of Computer Science, Yale University; New Haven, CT, 06520, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06520, USA
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Neumora Therapeutics; Watertown, MA, 02472, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine; Cardiff, CF24 4HQ, UK
- Department of Comparative Medicine, Yale University School of Medicine; New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University School of Medicine; New Haven, CT, 06520, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Statistics & Data Science, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute; Boston, MA, 02215, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Harvard Medical School; Boston, MA, 02215, USA
- Division of Genetics, Brigham and Women's Hospital; Boston, MA, 02215, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco; San Francisco, CA, 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA, 94158, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University; Changsha, Hunan, 410008, China
| | - Mette A Peters
- CNS Data Coordination Group, Sage Bionetworks; Seattle, WA, 98109, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University; New Haven, CT, 06520, USA
- Department of Computer Science, Yale University; New Haven, CT, 06520, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06520, USA
| | - Nikolaos P Daskalakis
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Neumora Therapeutics; Watertown, MA, 02472, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine; Cardiff, CF24 4HQ, UK
| | - Nenad Sestan
- Department of Comparative Medicine, Yale University School of Medicine; New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University School of Medicine; New Haven, CT, 06520, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
| | - Alexander Gusev
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute; Boston, MA, 02215, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Harvard Medical School; Boston, MA, 02215, USA
- Division of Genetics, Brigham and Women's Hospital; Boston, MA, 02215, USA
| | - Bogdan Pasaniuc
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco; San Francisco, CA, 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA, 94158, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University; Changsha, Hunan, 410008, China
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
| |
Collapse
|
24
|
Park JE, Smith MA, Van Alsten SC, Walens A, Wu D, Hoadley KA, Troester MA, Love MI. Diffsig: Associating Risk Factors With Mutational Signatures. bioRxiv 2023:2023.02.09.527740. [PMID: 36798154 PMCID: PMC9934616 DOI: 10.1101/2023.02.09.527740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Somatic mutational signatures elucidate molecular vulnerabilities to therapy and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors. Here we present Diffsig, a model and R package for estimating the association of risk factors with mutational signatures, suggesting etiologies for the pre-defined mutational signatures. Diffsig is a Bayesian Dirichlet-multinomial hierarchical model that allows testing of any type of risk factor while taking into account the uncertainty associated with samples with a low number of observations. In simulation, we found that our method can accurately estimate risk factor-mutational signal associations. We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. Diffsig is implemented as an R package available at: https://github.com/jennprk/diffsig.
Collapse
|
25
|
Kravitz SN, Ferris E, Love MI, Thomas A, Quinlan AR, Gregg C. Random allelic expression in the adult human body. Cell Rep 2023; 42:111945. [PMID: 36640362 PMCID: PMC10484211 DOI: 10.1016/j.celrep.2022.111945] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 10/17/2022] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
Genes are typically assumed to express both parental alleles similarly, yet cell lines show random allelic expression (RAE) for many autosomal genes that could shape genetic effects. Thus, understanding RAE in human tissues could improve our understanding of phenotypic variation. Here, we develop a methodology to perform genome-wide profiling of RAE and biallelic expression in GTEx datasets for 832 people and 54 tissues. We report 2,762 autosomal genes with some RAE properties similar to randomly inactivated X-linked genes. We found that RAE is associated with rapidly evolving regions in the human genome, adaptive signaling processes, and genes linked to age-related diseases such as neurodegeneration and cancer. We define putative mechanistic subtypes of RAE distinguished by gene overlaps on sense and antisense DNA strands, aggregation in clusters near telomeres, and increased regulatory complexity and inputs compared with biallelic genes. We provide foundations to study RAE in human phenotypes, evolution, and disease.
Collapse
Affiliation(s)
- Stephanie N Kravitz
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA; Neurobiology, University of Utah, Salt Lake City, UT, USA
| | - Elliott Ferris
- Neurobiology, University of Utah, Salt Lake City, UT, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alun Thomas
- Department of Internal Medicine, Epidemiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Christopher Gregg
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA; Neurobiology, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
26
|
Liang D, Aygün N, Matoba N, Ideraabdullah FY, Love MI, Stein JL. Inference of putative cell-type-specific imprinted regulatory elements and genes during human neuronal differentiation. Hum Mol Genet 2023; 32:402-416. [PMID: 35994039 PMCID: PMC9851749 DOI: 10.1093/hmg/ddac207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/02/2022] [Accepted: 08/17/2022] [Indexed: 01/24/2023] Open
Abstract
Genomic imprinting results in gene expression bias caused by parental chromosome of origin and occurs in genes with important roles during human brain development. However, the cell-type and temporal specificity of imprinting during human neurogenesis is generally unknown. By detecting within-donor allelic biases in chromatin accessibility and gene expression that are unrelated to cross-donor genotype, we inferred imprinting in both primary human neural progenitor cells and their differentiated neuronal progeny from up to 85 donors. We identified 43/20 putatively imprinted regulatory elements (IREs) in neurons/progenitors, and 133/79 putatively imprinted genes in neurons/progenitors. Although 10 IREs and 42 genes were shared between neurons and progenitors, most putative imprinting was only detected within specific cell types. In addition to well-known imprinted genes and their promoters, we inferred novel putative IREs and imprinted genes. Consistent with both DNA methylation-based and H3K27me3-based regulation of imprinted expression, some putative IREs also overlapped with differentially methylated or histone-marked regions. Finally, we identified a progenitor-specific putatively imprinted gene overlapping with copy number variation that is associated with uniparental disomy-like phenotypes. Our results can therefore be useful in interpreting the function of variants identified in future parent-of-origin association studies.
Collapse
Affiliation(s)
- Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Folami Y Ideraabdullah
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
27
|
Hamilton AM, Van Alsten SC, Gao X, Nsonwu-Farley J, Calhoun BC, Love MI, Troester MA, Hoadley KA. Incorporating RNA-based Risk Scores for Genomic Instability to Predict Breast Cancer Recurrence and Immunogenicity in a Diverse Population. Cancer Res Commun 2023; 3:12-20. [PMID: 36968228 PMCID: PMC10035450 DOI: 10.1158/2767-9764.crc-22-0267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 10/26/2022] [Accepted: 12/19/2022] [Indexed: 04/12/2023]
Abstract
Markers of genomic instability, including TP53 status and homologous recombination deficiency (HRD), are candidate biomarkers of immunogenicity and immune-mediated survival, but little is known about the distribution of these markers in large, population-based cohorts of racially diverse patients with breast cancer. In prior clinical trials, DNA-based approaches have been emphasized, but recent data suggest that RNA-based assessment can capture pathway differences conveniently and may be streamlined with other RNA-based genomic risk scores. Thus, we used RNA expression to study genomic instability (HRD and TP53 pathways) in context of the breast cancer immune microenvironment in three datasets (total n = 4,892), including 1,942 samples from the Carolina Breast Cancer Study, a population-based study that oversampled Black (n = 1,026) and younger women (n = 1,032). Across all studies, 36.9% of estrogen receptor (ER)-positive and 92.6% of ER-negative breast cancer had presence of at least one genomic instability signature. TP53 and HRD status were significantly associated with immune expression in both ER-positive and ER-negative breast cancer. RNA-based genomic instability signatures were associated with higher PD-L1, CD8 T-cell marker, and global and multimarker immune cell expression. Among tumors with genomic instability signatures, adaptive immune response was associated with improved recurrence-free survival regardless of ER status, highlighting genomic instability as a candidate marker for predicting immunotherapy response. Leveraging a convenient, integrated RNA-based approach, this analysis shows that genomic instability interacts with immune response, an important target in breast cancer overall and in Black women who experience higher frequency of TP53 and HR deficiency. Significance Despite promising advances in breast cancer immunotherapy, predictive biomarkers that are valid across diverse populations and breast cancer subtypes are needed. Genomic instability signatures can be coordinated with other RNA-based scores to define immunogenic breast cancers and may have value in stratifying immunotherapy trial participants.
Collapse
Affiliation(s)
- Alina M. Hamilton
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Sarah C. Van Alsten
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Xiaohua Gao
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Joseph Nsonwu-Farley
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
| | - Benjamin C. Calhoun
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Melissa A. Troester
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Katherine A. Hoadley
- Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
28
|
Dozmorov MG, Mu W, Davis ES, Lee S, Triche TJ, Phanstiel DH, Love MI. CTCF: an R/bioconductor data package of human and mouse CTCF binding sites. Bioinform Adv 2022; 2:vbac097. [PMID: 36699364 PMCID: PMC9793704 DOI: 10.1093/bioadv/vbac097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
Summary CTCF (CCCTC-binding factor) is an 11-zinc-finger DNA binding protein which regulates much of the eukaryotic genome's 3D structure and function. The diversity of CTCF binding motifs has led to a fragmented landscape of CTCF binding data. We collected position weight matrices of CTCF binding motifs and defined strand-oriented CTCF binding sites in the human and mouse genomes, including the recent Telomere to Telomere and mm39 assemblies. We included selected experimentally determined and predicted CTCF binding sites, such as CTCF-bound cis-regulatory elements from SCREEN ENCODE. We recommend filtering strategies for CTCF binding motifs and demonstrate that liftOver is a viable alternative to convert CTCF coordinates between assemblies. Our comprehensive data resource and usage recommendations can serve to harmonize and strengthen the reproducibility of genomic studies utilizing CTCF binding data. Availability and implementation https://bioconductor.org/packages/CTCF. Companion website: https://dozmorovlab.github.io/CTCF/; Code to reproduce the analyses: https://github.com/dozmorovlab/CTCF.dev. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Wancen Mu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, USA
| | - Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Stuart Lee
- Department of Econometrics and Business Statistics, Monash University, Clayton, NC 3168, Australia,Molecular Medicine Division, Walter and Eliza Hall Institute, Parkville, VIC 3052, Australia
| | - Timothy J Triche
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA,Department of Pediatrics, College of Human Medicine, Michigan State University, East Lansing, MI 48824, USA,Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Douglas H Phanstiel
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, USA,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|
29
|
Walens A, Van Alsten SC, Olsson LT, Smith MA, Lockhart A, Gao X, Hamilton AM, Kirk EL, Love MI, Gupta GP, Perou CM, Vaziri C, Hoadley KA, Troester MA. RNA-Based Classification of Homologous Recombination Deficiency in Racially Diverse Patients with Breast Cancer. Cancer Epidemiol Biomarkers Prev 2022; 31:2136-2147. [PMID: 36129803 PMCID: PMC9720427 DOI: 10.1158/1055-9965.epi-22-0590] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 08/03/2022] [Accepted: 09/14/2022] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Aberrant expression of DNA repair pathways such as homologous recombination (HR) can lead to DNA repair imbalance, genomic instability, and altered chemotherapy response. DNA repair imbalance may predict prognosis, but variation in DNA repair in diverse cohorts of breast cancer patients is understudied. METHODS To identify RNA-based patterns of DNA repair expression, we performed unsupervised clustering on 51 DNA repair-related genes in the Cancer Genome Atlas Breast Cancer [TCGA BRCA (n = 1,094)] and Carolina Breast Cancer Study [CBCS (n = 1,461)]. Using published DNA-based HR deficiency (HRD) scores (high-HRD ≥ 42) from TCGA, we trained an RNA-based supervised classifier. Unsupervised and supervised HRD classifiers were evaluated in association with demographics, tumor characteristics, and clinical outcomes. RESULTS : Unsupervised clustering on DNA repair genes identified four clusters of breast tumors, with one group having high expression of HR genes. Approximately 39.7% of CBCS and 29.3% of TCGA breast tumors had this unsupervised high-HRD (U-HRD) profile. A supervised HRD classifier (S-HRD) trained on TCGA had 84% sensitivity and 73% specificity to detect HRD-high samples. Both U-HRD and S-HRD tumors in CBCS had higher frequency of TP53 mutant-like status (45% and 41% enrichment) and basal-like subtype (63% and 58% enrichment). S-HRD high was more common among black patients. Among chemotherapy-treated participants, recurrence was associated with S-HRD high (HR: 2.38, 95% confidence interval = 1.50-3.78). CONCLUSIONS HRD is associated with poor prognosis and enriched in the tumors of black women. IMPACT RNA-level indicators of HRD are predictive of breast cancer outcomes in diverse populations.
Collapse
Affiliation(s)
- Andrea Walens
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina
| | - Sarah C. Van Alsten
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina
| | - Linnea T. Olsson
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina
| | - Markia A. Smith
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, North Carolina
| | - Alex Lockhart
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Xiaohua Gao
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina
| | - Alina M. Hamilton
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, North Carolina
| | - Erin L. Kirk
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Gaorav P. Gupta
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
| | - Charles M. Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Cyrus Vaziri
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, North Carolina
| | - Katherine A. Hoadley
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Melissa A. Troester
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, North Carolina
| |
Collapse
|
30
|
Smith MA, Van Alsten SC, Walens A, Damrauer JS, Maduekwe UN, Broaddus RR, Love MI, Troester MA, Hoadley KA. DNA Damage Repair Classifier Defines Distinct Groups in Hepatocellular Carcinoma. Cancers (Basel) 2022; 14:cancers14174282. [PMID: 36077818 PMCID: PMC9454479 DOI: 10.3390/cancers14174282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 12/02/2022] Open
Abstract
Simple Summary DNA repair pathways have been implicated in hepatocellular carcinoma outcomes. We found that hepatocellular carcinomas (HCC) could be separated into two groups (high and low) based on the overall expression of genes involved in DNA repair. Among the low repair group, there were three subgroups, one of which shared features of the high repair group. Given the important role of liver in metabolism and detoxification and its regenerative capacity, proliferation and DNA damage responses are critical in subdividing major biological categories of liver tumors. High repair samples showed more proliferative and regenerative signatures and had poorer outcomes versus the low repair that were more associated with the genes involved in normal liver biology. These biological groups suggest that dysregulation in endogenous liver processes promotes a pro-tumorigenic microenvironment that may facilitate tumor progression or identify tumors that require more substantial clinical intervention. Abstract DNA repair pathways have been associated with variability in hepatocellular carcinoma (HCC) clinical outcomes, but the mechanism through which DNA repair varies as a function of liver regeneration and other HCC characteristics is poorly understood. We curated a panel of 199 genes representing 15 DNA repair pathways to identify DNA repair expression classes and evaluate their associations with liver features and clinicopathologic variables in The Cancer Genome Atlas (TCGA) HCC study. We identified two groups in HCC, defined by low or high expression across all DNA repair pathways. The low-repair group had lower grade and retained the expression of classical liver markers, whereas the high-repair group had more clinically aggressive features, increased p53 mutant-like gene expression, and high liver regenerative gene expression. These pronounced features overshadowed the variation in the low-repair subset, but when considered separately, the low-repair samples included three subgroups: L1, L2, and L3. L3 had high DNA repair expression with worse progression-free (HR 1.24, 95% CI 0.81–1.91) and overall (HR 1.63, 95% CI 0.98–2.71) survival. High-repair outcomes were also significantly worse compared with the L1 and L2 groups. HCCs vary in DNA repair expression, and a subset of tumors with high regeneration profoundly disrupts liver biology and poor prognosis.
Collapse
Affiliation(s)
- Markia A. Smith
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Sarah C. Van Alsten
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Andrea Walens
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Jeffrey S. Damrauer
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Ugwuji N. Maduekwe
- Department of Surgery, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Russell R. Broaddus
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Michael I. Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Melissa A. Troester
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Katherine A. Hoadley
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Correspondence:
| |
Collapse
|
31
|
Hurson AN, Abubakar M, Hamilton AM, Conway K, Hoadley KA, Love MI, Olshan AF, Perou CM, Garcia-Closas M, Troester MA. Prognostic significance of RNA-based TP53 pathway function among estrogen receptor positive and negative breast cancer cases. NPJ Breast Cancer 2022; 8:74. [PMID: 35701440 PMCID: PMC9198049 DOI: 10.1038/s41523-022-00437-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/04/2022] [Indexed: 11/20/2022] Open
Abstract
TP53 and estrogen receptor (ER) are essential in breast cancer development and progression, but TP53 status (by DNA sequencing or protein expression) has been inconsistently associated with survival. We evaluated whether RNA-based TP53 classifiers are related to survival. Participants included 3213 women in the Carolina Breast Cancer Study (CBCS) with invasive breast cancer (stages I-III). Tumors were classified for TP53 status (mutant-like/wildtype-like) using an RNA signature. We used Cox proportional hazards models to estimate covariate-adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) for breast cancer-specific survival (BCSS) among ER- and TP53-defined subtypes. RNA-based results were compared to DNA- and IHC-based TP53 classification, as well as Basal-like versus non-Basal-like subtype. Findings from the diverse (50% Black), population-based CBCS were compared to those from the largely white METABRIC study. RNA-based TP53 mutant-like was associated with BCSS among both ER-negatives and ER-positives (HR (95% CI) = 5.38 (1.84-15.78) and 4.66 (1.79-12.15), respectively). Associations were attenuated when using DNA- or IHC-based TP53 classification. In METABRIC, few ER-negative tumors were TP53-wildtype-like, but TP53 status was a strong predictor of BCSS among ER-positives. In both populations, the effect of TP53 mutant-like status was similar to that for Basal-like subtype. RNA-based measures of TP53 status are strongly associated with BCSS and may have value among ER-negative cancers where few prognostic markers have been robustly validated. Given the role of TP53 in chemotherapeutic response, RNA-based TP53 as a prognostic biomarker could address an unmet need in breast cancer.
Collapse
Affiliation(s)
- Amber N Hurson
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Mustapha Abubakar
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Alina M Hamilton
- Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kathleen Conway
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Katherine A Hoadley
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Andrew F Olshan
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles M Perou
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Melissa A Troester
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
32
|
Mu W, Sarkar H, Srivastava A, Choi K, Patro R, Love MI. Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets. Bioinformatics 2022; 38:2773-2780. [PMID: 35561168 PMCID: PMC9113279 DOI: 10.1093/bioinformatics/btac212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 03/05/2022] [Accepted: 04/05/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected. RESULTS We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes. AVAILABILITY AND IMPLEMENTATION The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wancen Mu
- To whom correspondence should be addressed. or
| | - Hirak Sarkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | | |
Collapse
|
33
|
Vohra SN, Reeder-Hayes KE, Nichols HB, Emerson MA, Love MI, Olshan AF, Troester MA. Breast cancer treatment patterns by age and time since last pregnancy in the Carolina Breast Cancer Study Phase III. Breast Cancer Res Treat 2022; 192:435-445. [PMID: 35006482 PMCID: PMC8930462 DOI: 10.1007/s10549-022-06511-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 12/31/2021] [Indexed: 11/30/2022]
Abstract
PURPOSE To describe breast cancer treatment patterns among premenopausal women by age and time since last pregnancy. METHODS Data were analyzed from 1179 women diagnosed with premenopausal breast cancer in the Carolina Breast Cancer Study. Of these, 160 had a recent pregnancy (within 5 years of cancer diagnosis). Relative frequency differences (RFDs) and 95% confidence intervals (CIs) were used to compare cancer stage, treatment modality received, treatment initiation delay (> 30 days), and prolonged treatment duration (> 2 to > 8 months depending on the treatment received) by age and recency of pregnancy. RESULTS Recently postpartum women were significantly more likely to have stage III disease [RFD (95% CI) 12.2% (3.6%, 20.8%)] and to receive more aggressive treatment compared to nulliparous women. After adjustment for age, race and standard clinical tumor characteristics, recently postpartum women were significantly less likely to have delayed treatment initiation [RFD (95% CI) - 11.2% (- 21.4%, - 1.0%)] and prolonged treatment duration [RFD (95% CI) - 17.5% (- 28.0%, - 7.1%)] and were more likely to have mastectomy [RFD (95% CI) 14.9% (4.8%, 25.0%)] compared to nulliparous. Similarly, younger women (< 40 years of age) were significantly less likely to experience prolonged treatment duration [RFD (95% CI) - 5.6% (- 11.1%, - 0.0%)] and more likely to undergo mastectomy [RFD (95% CI) 10.6% (5.2%, 16.0%)] compared to the study population as a whole. CONCLUSION These results suggest that recently postpartum and younger women often received prompt and aggressive breast cancer treatment. Higher mortality and recurrence among recently pregnant women are unlikely to be related to undertreatment.
Collapse
Affiliation(s)
- Sanah N Vohra
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Katherine E Reeder-Hayes
- Division of Hematology/Oncology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hazel B Nichols
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Marc A Emerson
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Andrew F Olshan
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Melissa A Troester
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
34
|
Wang X, Chen H, Kapoor PM, Su YR, Bolla MK, Dennis J, Dunning AM, Lush M, Wang Q, Michailidou K, Pharoah PD, Hopper JL, Southey MC, Koutros S, Freeman LEB, Stone J, Rennert G, Shibli R, Murphy RA, Aronson K, Guénel P, Truong T, Teras LR, Hodge JM, Canzian F, Kaaks R, Brenner H, Arndt V, Hoppe R, Lo WY, Behrens S, Mannermaa A, Kosma VM, Jung A, Becher H, Giles GG, Haiman CA, Maskarinec G, Scott C, Winham S, Simard J, Goldberg MS, Zheng W, Long J, Troester MA, Love MI, Peng C, Tamimi R, Eliassen H, García-Closas M, Figueroa J, Ahearn T, Yang R, Evans DG, Howell A, Hall P, Czene K, Wolk A, Sandler DP, Taylor JA, Swerdlow AJ, Orr N, Lacey JV, Wang S, Olsson H, Easton DF, Milne RL, Hsu L, Kraft P, Chang-Claude J, Lindström S. A genome-wide gene-based gene-environment interaction study of breast cancer in more than 90,000 women. Cancer Res Commun 2022; 2:211-219. [PMID: 36303815 PMCID: PMC9604427 DOI: 10.1158/2767-9764.crc-21-0119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 03/21/2022] [Accepted: 03/24/2022] [Indexed: 06/14/2023]
Abstract
Background Genome-wide association studies (GWAS) have identified more than 200 susceptibility loci for breast cancer, but these variants explain less than a fifth of the disease risk. Although gene-environment interactions have been proposed to account for some of the remaining heritability, few studies have empirically assessed this. Methods We obtained genotype and risk factor data from 46,060 cases and 47,929 controls of European ancestry from population-based studies within the Breast Cancer Association Consortium (BCAC). We built gene expression prediction models for 4,864 genes with a significant (P<0.01) heritable component using the transcriptome and genotype data from the Genotype-Tissue Expression (GTEx) project. We leveraged predicted gene expression information to investigate the interactions between gene-centric genetic variation and 14 established risk factors in association with breast cancer risk, using a mixed-effects score test. Results After adjusting for number of tests using Bonferroni correction, no interaction remained statistically significant. The strongest interaction observed was between the predicted expression of the C13orf45 gene and age at first full-term pregnancy (PGXE=4.44×10-6). Conclusion In this transcriptome-informed genome-wide gene-environment interaction study of breast cancer, we found no strong support for the role of gene expression in modifying the associations between established risk factors and breast cancer risk. Impact Our study suggests a limited role of gene-environment interactions in breast cancer risk.
Collapse
Affiliation(s)
- Xiaoliang Wang
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, Washington
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Hongjie Chen
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, Washington
| | - Pooja Middha Kapoor
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Faculty of Medicine, University of Heidelberg, Heidelberg, Germany
| | - Yu-Ru Su
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Manjeet K. Bolla
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - Joe Dennis
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - Alison M. Dunning
- Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - Michael Lush
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - Qin Wang
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - Kyriaki Michailidou
- Biostatistics Unit, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
- Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | - Paul D.P. Pharoah
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
- Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - John L. Hopper
- Centre for Epidemiology and Biostatistics, School of Population and Global Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Melissa C. Southey
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, Victoria, Australia
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia
- Department of Clinical Pathology, The University of Melbourne, Victoria, Australia
| | - Stella Koutros
- Division of Cancer Epidemiology and Genetic, NCI, NIH, Bethesda, Maryland
| | | | - Jennifer Stone
- Genetic Epidemiology Group, School of Population and Global Health, University of Western Australia, Crawley, Australia
| | - Gad Rennert
- Department of Community Medicine and Epidemiology, Carmel Medical Center, Haifa, Israel
| | - Rana Shibli
- Department of Community Medicine and Epidemiology, Carmel Medical Center, Haifa, Israel
| | - Rachel A. Murphy
- Cancer Control Research, BC Cancer and School of Population and Public Health, University of British Columbia, Vancouver, Canada
| | - Kristan Aronson
- Public Health Sciences, Queen's University, Kingston, Canada
| | - Pascal Guénel
- Université Paris-Saclay, Inserm, CESP, Team Exposome and Heredity, Villejuif, France
| | - Thérèse Truong
- Université Paris-Saclay, Inserm, CESP, Team Exposome and Heredity, Villejuif, France
| | - Lauren R. Teras
- Department of Population Science, American Cancer Society, Atlanta, Georgia
| | - James M. Hodge
- Department of Population Science, American Cancer Society, Atlanta, Georgia
| | - Federico Canzian
- Genomic Epidemiology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Rudolf Kaaks
- Genomic Epidemiology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Volker Arndt
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Reiner Hoppe
- Dr. Margarete Fischer-Bosch-Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, German
| | - Wing-Yee Lo
- Dr. Margarete Fischer-Bosch-Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, German
| | - Sabine Behrens
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Arto Mannermaa
- Translational Cancer Research Area, University of Eastern Finland, Kuopio, Finland
- Institute of Clinical Medicine, Pathology and Forensic Medicine, University of Eastern Finland, Kuopio, Finland
- Biobank of Eastern Finland, Kuopio University Hospital, Kuopio, Finland
| | - Veli-Matti Kosma
- Translational Cancer Research Area, University of Eastern Finland, Kuopio, Finland
- Institute of Clinical Medicine, Pathology and Forensic Medicine, University of Eastern Finland, Kuopio, Finland
- Biobank of Eastern Finland, Kuopio University Hospital, Kuopio, Finland
| | - Audrey Jung
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Heiko Becher
- Institute for Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Graham G. Giles
- Centre for Epidemiology and Biostatistics, School of Population and Global Health, The University of Melbourne, Melbourne, Victoria, Australia
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, Victoria, Australia
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia
| | - Christopher A. Haiman
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
| | | | - Christopher Scott
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Stacey Winham
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Jacques Simard
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Mark S. Goldberg
- Department of Medicine, McGill University, Montréal, Quebec, Canada; Division of Clinical Epidemiology, Royal Victoria Hospital, McGill University, Montréal, Quebec, Canada
| | - Wei Zheng
- Division of Epidemiology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jirong Long
- Division of Epidemiology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Melissa A. Troester
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Michael I. Love
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Cheng Peng
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts
| | - Rulla Tamimi
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York
| | - Heather Eliassen
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | | | - Jonine Figueroa
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh Medical School, Edinburgh, United Kingdom
| | - Thomas Ahearn
- Division of Cancer Epidemiology and Genetic, NCI, NIH, Bethesda, Maryland
| | - Rose Yang
- Division of Cancer Epidemiology and Genetic, NCI, NIH, Bethesda, Maryland
| | - D. Gareth Evans
- Division of Evolution and Genomic Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom
- Genomic Medicine, St Mary's Hospital, Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, United Kingdom
- NIHR Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, Manchester University NHS Foundation Trust, Manchester, United Kingdom
| | - Anthony Howell
- Division of Cancer Sciences, University of Manchester, Manchester, United Kingdom
| | - Per Hall
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Kamila Czene
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Alicja Wolk
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Dale P. Sandler
- Epidemiology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institute of Health, Research Triangle Park, North Carolina
| | - Jack A. Taylor
- Epidemiology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institute of Health, Research Triangle Park, North Carolina
| | - Anthony J. Swerdlow
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, United Kingdom
- Division of Breast Cancer Research, The Institute of Cancer Research, London, United K.ingdom
| | - Nick Orr
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, United Kingdom
| | - James V. Lacey
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California
| | - Sophia Wang
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California
| | - Håkan Olsson
- Departments of Oncology and Cancer Epidemiology, Clinical Sciences, Lund University, Lund, Sweden
- Deceased
| | - Douglas F. Easton
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
- Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom
| | - Roger L. Milne
- Centre for Epidemiology and Biostatistics, School of Population and Global Health, The University of Melbourne, Melbourne, Victoria, Australia
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, Victoria, Australia
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| | - Peter Kraft
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Cancer Epidemiology Group, University Medical Centre Hamburg-Eppendorf, University Cancer Centre Hamburg (UCCH), Hamburg, Germany
| | - Sara Lindström
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, Washington
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| |
Collapse
|
35
|
Jones GS, Hoadley KA, Benefield H, Olsson LT, Hamilton AM, Bhattacharya A, Kirk EL, Tipaldos HJ, Fleming JM, Williams KP, Love MI, Nichols HB, Olshan AF, Troester MA. Racial differences in breast cancer outcomes by hepatocyte growth factor pathway expression. Breast Cancer Res Treat 2022; 192:447-455. [PMID: 35034243 PMCID: PMC9380654 DOI: 10.1007/s10549-021-06497-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 12/16/2021] [Indexed: 11/02/2022]
Abstract
PURPOSE Black women have a 40% increased risk of breast cancer-related mortality. These outcome disparities may reflect differences in tumor pathways and a lack of targetable therapies for specific subtypes that are more common in Black women. Hepatocyte growth factor (HGF) is a targetable pathway that promotes breast cancer tumorigenesis, is associated with basal-like breast cancer, and is differentially expressed by race. This study assessed whether a 38-gene HGF expression signature is associated with recurrence and survival in Black and non-Black women. METHODS Study participants included 1957 invasive breast cancer cases from the Carolina Breast Cancer Study. The HGF signature was evaluated in association with recurrence (n = 1251, 171 recurrences), overall, and breast cancer-specific mortality (n = 706, 190/328 breast cancer/overall deaths) using Cox proportional hazard models. RESULTS Women with HGF-positive tumors had higher recurrence rates [HR 1.88, 95% CI (1.19, 2.98)], breast cancer-specific mortality [HR 1.90, 95% CI (1.26, 2.85)], and overall mortality [HR 1.69; 95% CI (1.17, 2.43)]. Among Black women, HGF positivity was significantly associated with higher 5-year rate of recurrence [HR 1.73; 95% CI (1.01, 2.99)], but this association was not significant in non-Black women [HR 1.68; 95% CI (0.72, 3.90)]. Among Black women, HGF-positive tumors had elevated breast cancer-specific mortality [HR 1.80, 95% CI (1.05, 3.09)], which was not significant in non-Black women [HR 1.52; 95% CI (0.78, 2.99)]. CONCLUSION This multi-gene HGF signature is a poor-prognosis feature for breast cancer and may identify patients who could benefit from HGF-targeted treatments, an unmet need for Black and triple-negative patients.
Collapse
Affiliation(s)
- Gieira S Jones
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA
| | - Katherine A Hoadley
- Department of Genetics, University of North Carolina-Chapel Hill-Chapel Hill, Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
| | - Halei Benefield
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA
| | - Linnea T Olsson
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA
| | - Alina M Hamilton
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, USA
| | - Arjun Bhattacharya
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Erin L Kirk
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA
| | - Heather J Tipaldos
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
| | - Jodie M Fleming
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, USA
| | - Kevin P Williams
- Biomanufacturing Research Institute and Technology Enterprise, North Carolina Central University, Durham, USA
- Department of Pharmaceutical Sciences, North Carolina Central University, Durham, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina-Chapel Hill-Chapel Hill, Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, USA
| | - Hazel B Nichols
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
| | - Andrew F Olshan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
| | - Melissa A Troester
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, 27599-7400, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA.
| |
Collapse
|
36
|
Kramer NE, Davis ES, Wenger CD, Deoudes EM, Parker SM, Love MI, Phanstiel DH. Plotgardener: cultivating precise multi-panel figures in R. Bioinformatics 2022; 38:2042-2045. [PMID: 35134826 PMCID: PMC8963281 DOI: 10.1093/bioinformatics/btac057] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/28/2022] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The R programming language is one of the most widely used programming languages for transforming raw genomic datasets into meaningful biological conclusions through analysis and visualization, which has been largely facilitated by infrastructure and tools developed by the Bioconductor project. However, existing plotting packages rely on relative positioning and sizing of plots, which is often sufficient for exploratory analysis but is poorly suited for the creation of publication-quality multi-panel images inherent to scientific manuscript preparation. RESULTS We present plotgardener, a coordinate-based genomic data visualization package that offers a new paradigm for multi-plot figure generation in R. Plotgardener allows precise, programmatic control over the placement, esthetics and arrangements of plots while maximizing user experience through fast and memory-efficient data access, support for a wide variety of data and file types, and tight integration with the Bioconductor environment. Plotgardener also allows precise placement and sizing of ggplot2 plots, making it an invaluable tool for R users and data scientists from virtually any discipline. AVAILABILITY AND IMPLEMENTATION Package: https://bioconductor.org/packages/plotgardener, Code: https://github.com/PhanstielLab/plotgardener, Documentation: https://phanstiellab.github.io/plotgardener/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicole E Kramer
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Eric S Davis
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - Erika M Deoudes
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Sarah M Parker
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
37
|
Patel A, García-Closas M, Olshan AF, Perou CM, Troester MA, Love MI, Bhattacharya A. Gene-Level Germline Contributions to Clinical Risk of Recurrence Scores in Black and White Patients with Breast Cancer. Cancer Res 2022; 82:25-35. [PMID: 34711612 PMCID: PMC8732329 DOI: 10.1158/0008-5472.can-21-1207] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 09/30/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023]
Abstract
Continuous risk of recurrence scores (CRS) based on tumor gene expression are vital prognostic tools for breast cancer. Studies have shown that Black women (BW) have higher CRS than White women (WW). Although systemic injustices contribute substantially to breast cancer disparities, evidence of biological and germline contributions is emerging. In this study, we investigated germline genetic associations with CRS and CRS disparity using approaches modeled after transcriptome-wide association studies (TWAS). In the Carolina Breast Cancer Study, using race-specific predictive models of tumor expression from germline genetics, we performed race-stratified (N = 1,043 WW, 1,083 BW) linear regressions of three CRS (ROR-S: PAM50 subtype score; proliferation score; ROR-P: ROR-S plus proliferation score) on imputed tumor genetically regulated tumor expression (GReX). Bayesian multivariate regression and adaptive shrinkage tested GReX-prioritized genes for associations with tumor PAM50 expression and subtype to elucidate patterns of germline regulation underlying GReX-CRS associations. At FDR-adjusted P < 0.10, 7 and 1 GReX prioritized genes among WW and BW, respectively. Among WW, CRS were positively associated with MCM10, FAM64A, CCNB2, and MMP1 GReX and negatively associated with VAV3, PCSK6, and GNG11 GReX. Among BW, higher MMP1 GReX predicted lower proliferation score and ROR-P. GReX-prioritized gene and PAM50 tumor expression associations highlighted potential mechanisms for GReX-prioritized gene to CRS associations. Among patients with breast cancer, differential germline associations with CRS were found by race, underscoring the need for larger, diverse datasets in molecular studies of breast cancer. These findings also suggest possible germline trans-regulation of PAM50 tumor expression, with potential implications for CRS interpretation in clinical settings. SIGNIFICANCE: This study identifies race-specific genetic associations with breast cancer risk of recurrence scores and suggests mediation of these associations by PAM50 subtype and expression, with implications for clinical interpretation of these scores.
Collapse
Affiliation(s)
- Achal Patel
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
| | - Montserrat García-Closas
- Division of Cancer Epidemiology and Genetics, NCI, Bethesda, Maryland
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, United Kingdom
| | - Andrew F Olshan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
| | - Charles M Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
| | - Melissa A Troester
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
| | - Michael I Love
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, California.
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, Carolina
| |
Collapse
|
38
|
Matoba N, Love MI, Stein JL. Evaluating brain structure traits as endophenotypes using polygenicity and discoverability. Hum Brain Mapp 2022; 43:329-340. [PMID: 33098356 PMCID: PMC8675430 DOI: 10.1002/hbm.25257] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 09/28/2020] [Accepted: 10/11/2020] [Indexed: 12/24/2022] Open
Abstract
Human brain structure traits have been hypothesized to be broad endophenotypes for neuropsychiatric disorders, implying that brain structure traits are comparatively "closer to the underlying biology." Genome-wide association studies from large sample sizes allow for the comparison of common variant genetic architectures between traits to test the evidence supporting this claim. Endophenotypes, compared to neuropsychiatric disorders, are hypothesized to have less polygenicity, with greater effect size of each susceptible SNP, requiring smaller sample sizes to discover them. Here, we compare polygenicity and discoverability of brain structure traits, neuropsychiatric disorders, and other traits (91 in total) to directly test this hypothesis. We found reduced polygenicity (FDR = 0.01) and increased discoverability (FDR = 3.68 × 10-9 ) of cortical brain structure traits, as compared to aggregated estimates of multiple neuropsychiatric disorders. We predict that ~8 M individuals will be required to explain the full heritability of cortical surface area by genome-wide significant SNPs, whereas sample sizes over 20 M will be required to explain the full heritability of depression. In conclusion, our findings are consistent with brain structure satisfying the higher power criterion of endophenotypes.
Collapse
Affiliation(s)
- Nana Matoba
- Department of GeneticsUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
- UNC Neuroscience CenterUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Michael I. Love
- Department of GeneticsUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
- Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Jason L. Stein
- Department of GeneticsUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
- UNC Neuroscience CenterUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| |
Collapse
|
39
|
Hurson AN, Abubakar M, Hamilton AM, Conway K, Hoadley KA, Love MI, Olshan AF, Perou CM, Garcia-Closas M, Troester MA. TP53 Pathway Function, Estrogen Receptor Status, and Breast Cancer Risk Factors in the Carolina Breast Cancer Study. Cancer Epidemiol Biomarkers Prev 2022; 31:124-131. [PMID: 34737209 PMCID: PMC8755611 DOI: 10.1158/1055-9965.epi-21-0661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/25/2021] [Accepted: 10/26/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND TP53 and estrogen receptor (ER) both play essential roles in breast cancer development and progression, with recent research revealing cross-talk between TP53 and ER signaling pathways. Although many studies have demonstrated heterogeneity of risk factor associations across ER subtypes, associations by TP53 status have been inconsistent. METHODS This case-case analysis included incident breast cancer cases (47% Black) from the Carolina Breast Cancer Study (1993-2013). Formalin-fixed paraffin-embedded tumor samples were classified for TP53 functional status (mutant-like/wild-type-like) using a validated RNA signature. For IHC-based TP53 status, mutant-like was classified as at least 10% positivity. We used two-stage polytomous logistic regression to evaluate risk factor heterogeneity due to RNA-based TP53 and/or ER, adjusting for each other and for PR, HER2, and grade. We then compared this with the results when using IHC-based TP53 classification. RESULTS The RNA-based classifier identified 55% of tumors as TP53 wild-type-like and 45% as mutant-like. Several hormone-related factors (oral contraceptive use, menopausal status, age at menopause, and pre- and postmenopausal body mass index) were associated with TP53 mutant-like status, whereas reproductive factors (age at first birth and parity) and smoking were associated with ER status. Multiparity was associated with both TP53 and ER. When classifying TP53 status using IHC methods, no associations were observed with TP53. Associations observed with RNA-based TP53 remained after accounting for basal-like subtype. CONCLUSIONS This case-case study found breast cancer risk factors associated with RNA-based TP53 and ER. IMPACT RNA-based TP53 and ER represent an emerging etiologic schema of interest in breast cancer prevention research.
Collapse
Affiliation(s)
- Amber N Hurson
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland
| | - Mustapha Abubakar
- Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland
| | - Alina M Hamilton
- Department of Pathology and Laboratory Medicine, The University of North Carolina, Chapel Hill, North Carolina
| | - Kathleen Conway
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Katherine A Hoadley
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Andrew F Olshan
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Charles M Perou
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | | | - Melissa A Troester
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
| |
Collapse
|
40
|
Vohra SN, Walens A, Hamilton AM, Sherman ME, Schedin P, Nichols HB, Reeder-Hayes KE, Olshan AF, Love MI, Troester MA. Molecular and clinical characterization of postpartum-associated breast cancer in the Carolina Breast Cancer Study Phase I-III, 1993-2013. Cancer Epidemiol Biomarkers Prev 2021; 31:561-568. [PMID: 34810211 PMCID: PMC8901538 DOI: 10.1158/1055-9965.epi-21-0940] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 10/20/2021] [Accepted: 11/10/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Breast cancers in recently postpartum women may have worse outcomes, but studies examining tumor molecular features by pregnancy recency have shown conflicting results. METHODS This analysis used Carolina Breast Cancer Study data to examine clinical and molecular tumor features among women <50 years of age who were recently ( {less than or equal to} 10 years prior), or remotely (>10 years prior) postpartum, or nulliparous. Prevalence odds ratios (PORs) and 95% confidence intervals (CIs) were estimated using multivariable models. RESULTS Recently postpartum women (N=618) were more frequently lymph node positive [POR (95% CI): 1.66 (1.26, 2.19)], ER negative [1.37 (1.02, 1.83)], and IHC-based triple negative [1.57 (1.00, 2.47)] compared to nulliparous (N=360) women. Some differences were identified between recent vs. remotely postpartum; smaller tumor size [0.67 (0.52, 0.86)], p53 wildtype [0.53 (0.36, 0.77)], and non-basal-like phenotype [0.53 (0.33, 0.84)] were more common among recently postpartum. Recently postpartum (vs. nulliparous) had significant enrichment for adaptive immunity, T cells, B cells, CD8 T cells, activated CD8 T cells/NK cells, Tfh cells and higher overall immune cell scores. These differences were attenuated in remotely (compared to recently) postpartum women. CONCLUSIONS These results suggest a dominant effect of parity (vs. nulliparity) and a lesser effect of pregnancy recency on tumor molecular features, although tumor immune microenvironments were altered in association with pregnancy recency. IMPACT Our study is unique in examining tumor immune microenvironment and RNA-based markers according to time since last childbirth. Future studies should examine the interplay between tumor features, post-diagnostic treatment and outcomes among recently postpartum women.
Collapse
Affiliation(s)
- Sanah N Vohra
- Epidemiology, University of North Carolina at Chapel Hill
| | - Andrea Walens
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill
| | - Alina M Hamilton
- Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill
| | | | - Pepper Schedin
- Cell, Developmental and Cancer Biology, Oregon Health & Science University
| | - Hazel B Nichols
- Department of Epidemiology, University of North Carolina at Chapel Hill
| | | | - Andrew F Olshan
- Department of Epidemiology, University of North Carolina at Chapel Hill
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Melissa A Troester
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
| |
Collapse
|
41
|
Perrin HJ, Currin KW, Vadlamudi S, Pandey GK, Ng KK, Wabitsch M, Laakso M, Love MI, Mohlke KL. Chromatin accessibility and gene expression during adipocyte differentiation identify context-dependent effects at cardiometabolic GWAS loci. PLoS Genet 2021; 17:e1009865. [PMID: 34699533 PMCID: PMC8570510 DOI: 10.1371/journal.pgen.1009865] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 11/05/2021] [Accepted: 10/07/2021] [Indexed: 12/15/2022] Open
Abstract
Chromatin accessibility and gene expression in relevant cell contexts can guide identification of regulatory elements and mechanisms at genome-wide association study (GWAS) loci. To identify regulatory elements that display differential activity across adipocyte differentiation, we performed ATAC-seq and RNA-seq in a human cell model of preadipocytes and adipocytes at days 4 and 14 of differentiation. For comparison, we created a consensus map of ATAC-seq peaks in 11 human subcutaneous adipose tissue samples. We identified 58,387 context-dependent chromatin accessibility peaks and 3,090 context-dependent genes between all timepoint comparisons (log2 fold change>1, FDR<5%) with 15,919 adipocyte- and 18,244 preadipocyte-dependent peaks. Adipocyte-dependent peaks showed increased overlap (60.1%) with Roadmap Epigenomics adipocyte nuclei enhancers compared to preadipocyte-dependent peaks (11.5%). We linked context-dependent peaks to genes based on adipocyte promoter capture Hi-C data, overlap with adipose eQTL variants, and context-dependent gene expression. Of 16,167 context-dependent peaks linked to a gene, 5,145 were linked by two or more strategies to 1,670 genes. Among GWAS loci for cardiometabolic traits, adipocyte-dependent peaks, but not preadipocyte-dependent peaks, showed significant enrichment (LD score regression P<0.005) for waist-to-hip ratio and modest enrichment (P < 0.05) for HDL-cholesterol. We identified 659 peaks linked to 503 genes by two or more approaches and overlapping a GWAS signal, suggesting a regulatory mechanism at these loci. To identify variants that may alter chromatin accessibility between timepoints, we identified 582 variants in 454 context-dependent peaks that demonstrated allelic imbalance in accessibility (FDR<5%), of which 55 peaks also overlapped GWAS variants. At one GWAS locus for palmitoleic acid, rs603424 was located in an adipocyte-dependent peak linked to SCD and exhibited allelic differences in transcriptional activity in adipocytes (P = 0.003) but not preadipocytes (P = 0.09). These results demonstrate that context-dependent peaks and genes can guide discovery of regulatory variants at GWAS loci and aid identification of regulatory mechanisms. Cardiovascular and metabolic diseases are widespread, and an increased understanding of genetic mechanisms behind these diseases could improve treatment. Chromatin accessibility and gene expression in relevant cell contexts can guide identification of regulatory elements and genetic mechanisms for disease traits. A relevant context for cardiovascular and metabolic disease traits is adipocyte differentiation. To identify regulatory elements and genes that display differences in activity during adipocyte differentiation, we profiled chromatin accessibility and gene expression in a human cell model of preadipocytes and adipocytes. We identified chromatin regions that change accessibility during differentiation and predicted genes they may affect. We also linked these chromatin regions to genetic variants associated with risk of disease. At one genomic region linked to fatty acids, a chromatin region more accessible in adipocytes linked to a fatty acid synthesis gene and exhibited allelic differences in transcriptional activity in adipocytes but not preadipocytes. These results demonstrate that chromatin regions and genes that change during cell context can guide discovery of regulatory variants and aid identification of disease mechanisms.
Collapse
Affiliation(s)
- Hannah J. Perrin
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Kevin W. Currin
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Swarooparani Vadlamudi
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Gautam K. Pandey
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Kenneth K. Ng
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Martin Wabitsch
- Department of Pediatrics and Adolescent Medicine, Ulm University Hospital, Ulm, Germany
| | - Markku Laakso
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - Michael I. Love
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Karen L. Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
42
|
Aygün N, Elwell AL, Liang D, Lafferty MJ, Cheek KE, Courtney KP, Mory J, Hadden-Ford E, Krupa O, de la Torre-Ubieta L, Geschwind DH, Love MI, Stein JL. Brain-trait-associated variants impact cell-type-specific gene regulation during neurogenesis. Am J Hum Genet 2021; 108:1647-1668. [PMID: 34416157 PMCID: PMC8456186 DOI: 10.1016/j.ajhg.2021.07.011] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 07/23/2021] [Indexed: 12/21/2022] Open
Abstract
Interpretation of the function of non-coding risk loci for neuropsychiatric disorders and brain-relevant traits via gene expression and alternative splicing quantitative trait locus (e/sQTL) analyses is generally performed in bulk post-mortem adult tissue. However, genetic risk loci are enriched in regulatory elements active during neocortical differentiation, and regulatory effects of risk variants may be masked by heterogeneity in bulk tissue. Here, we map e/sQTLs, and allele-specific expression in cultured cells representing two major developmental stages, primary human neural progenitors (n = 85) and their sorted neuronal progeny (n = 74), identifying numerous loci not detected in either bulk developing cortical wall or adult cortex. Using colocalization and genetic imputation via transcriptome-wide association, we uncover cell-type-specific regulatory mechanisms underlying risk for brain-relevant traits that are active during neocortical differentiation. Specifically, we identified a progenitor-specific eQTL for CENPW co-localized with common variant associations for cortical surface area and educational attainment.
Collapse
Affiliation(s)
- Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Angela L Elwell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael J Lafferty
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kerry E Cheek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kenan P Courtney
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jessica Mory
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ellie Hadden-Ford
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Oleh Krupa
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Luis de la Torre-Ubieta
- Neurogenetics Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Daniel H Geschwind
- Neurogenetics Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
43
|
Cao KAL, Abadi AJ, Davis-Marcisak EF, Hsu L, Arora A, Coullomb A, Deshpande A, Feng Y, Jeganathan P, Loth M, Meng C, Mu W, Pancaldi V, Sankaran K, Righelli D, Singh A, Sodicoff JS, Stein-O'Brien GL, Subramanian A, Welch JD, You Y, Argelaguet R, Carey VJ, Dries R, Greene CS, Holmes S, Love MI, Ritchie ME, Yuan GC, Culhane AC, Fertig E. Author Correction: Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol 2021; 22:246. [PMID: 34433496 PMCID: PMC8385897 DOI: 10.1186/s13059-021-02468-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
| | - Al J Abadi
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Emily F Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Lauren Hsu
- Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Arshi Arora
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Alexis Coullomb
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
| | - Atul Deshpande
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yuzhou Feng
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Melanie Loth
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Wancen Mu
- Department of Biostatistics, UNC, Chapel Hill, NC, USA
| | - Vera Pancaldi
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Kris Sankaran
- Department of Statistics, University of Wisconsin, Madison, WI, USA
| | - Dario Righelli
- Department of Statistical Sciences, University of Padova, Padova, PD, Italy
| | - Amrit Singh
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
- PROOF Centre of Excellence, Vancouver, BC, Canada
| | - Joshua S Sodicoff
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Genevieve L Stein-O'Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
| | | | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | | | - Vincent J Carey
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ruben Dries
- Department of Hematology and Oncology, Boston Medical Center, Boston, MA, USA
- Department of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
- Center for Regenerative Medicine (CReM), Boston University, Boston, MA, USA
| | - Casey S Greene
- Center for Health AI and Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Michael I Love
- Department of Biostatistics, UNC, Chapel Hill, NC, USA
- Department of Genetics, UNC, Chapel Hill, NC, USA
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Aedin C Culhane
- Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Elana Fertig
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA
| |
Collapse
|
44
|
Lê Cao KA, Abadi AJ, Davis-Marcisak EF, Hsu L, Arora A, Coullomb A, Deshpande A, Feng Y, Jeganathan P, Loth M, Meng C, Mu W, Pancaldi V, Sankaran K, Righelli D, Singh A, Sodicoff JS, Stein-O’Brien GL, Subramanian A, Welch JD, You Y, Argelaguet R, Carey VJ, Dries R, Greene CS, Holmes S, Love MI, Ritchie ME, Yuan GC, Culhane AC, Fertig E. Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol 2021; 22:220. [PMID: 34353350 PMCID: PMC8340473 DOI: 10.1186/s13059-021-02433-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Al J. Abadi
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Emily F. Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
| | - Lauren Hsu
- Data Science, Dana-Farber Cancer Institute, Boston, MA USA
- Department of Genetics, UNC, Chapel Hill, NC USA
| | - Arshi Arora
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY USA
| | - Alexis Coullomb
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
| | - Atul Deshpande
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Yuzhou Feng
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Melanie Loth
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Wancen Mu
- Department of Biostatistics, UNC, Chapel Hill, NC USA
| | - Vera Pancaldi
- Centre de Recherches en Cancérologie de Toulouse (INSERM), Université Paul Sabatier III, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Kris Sankaran
- Department of Statistics, University of Wisconsin, Madison, WI USA
| | - Dario Righelli
- Department of Statistical Sciences, University of Padova, Padova, PD Italy
| | - Amrit Singh
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
- PROOF Centre of Excellence, Vancouver, BC Canada
| | - Joshua S. Sodicoff
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI USA
| | - Genevieve L. Stein-O’Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD USA
| | | | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI USA
| | - Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | | | - Vincent J. Carey
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Ruben Dries
- Department of Hematology and Oncology, Boston Medical Center, Boston, MA USA
- Department of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
- Center for Regenerative Medicine (CReM), Boston University, Boston, MA USA
| | - Casey S. Greene
- Center for Health AI and Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA USA
| | - Michael I. Love
- Department of Biostatistics, UNC, Chapel Hill, NC USA
- Department of Genetics, UNC, Chapel Hill, NC USA
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, University of Melbourne, Melbourne, Australia
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Aedin C. Culhane
- Data Science, Dana-Farber Cancer Institute, Boston, MA USA
- Biostatistics, Harvard TH Chan School of Public Health, Boston, MA USA
| | - Elana Fertig
- Cancer Convergence Institute and Division of Quantitative Sciences, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD USA
| |
Collapse
|
45
|
Jones GS, Hoadley KA, Olsson LT, Hamilton AM, Bhattacharya A, Kirk EL, Tipaldos HJ, Fleming JM, Love MI, Nichols HB, Olshan AF, Troester MA. Hepatocyte growth factor pathway expression in breast cancer by race and subtype. Breast Cancer Res 2021; 23:80. [PMID: 34344422 PMCID: PMC8336233 DOI: 10.1186/s13058-021-01460-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 07/20/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND African American women have the highest risk of breast cancer mortality compared to other racial groups. Differences in tumor characteristics have been implicated as a possible cause; however, the tumor microenvironment may also contribute to this disparity in mortality. Hepatocyte growth factor (HGF) is a stroma-derived marker of the tumor microenvironment that may affect tumor progression differentially by race. OBJECTIVE To examine whether an HGF gene expression signature is differentially expressed by race and tumor characteristics. METHODS Invasive breast tumors from 1957 patients were assessed for a 38-gene RNA-based HGF gene expression signature. Participants were black (n = 1033) and non-black (n = 924) women from the population-based Carolina Breast Cancer Study (1993-2013). Generalized linear models were used to estimate the relative frequency differences (RFD) in HGF status by race, clinical, and demographic factors. RESULTS Thirty-two percent of tumors were positive for the HGF signature. Black women were more likely [42% vs. 21%; RFD = + 19.93% (95% CI 16.00, 23.87)] to have HGF-positive tumors compared to non-black women. Triple-negative patients had a higher frequency of HGF positivity [82% vs. 13% in non-triple-negative; RFD = + 65.85% (95% CI 61.71, 69.98)], and HGF positivity was a defining feature of basal-like subtype [92% vs. 8% in non-basal; RFD = + 81.84% (95% CI 78.84, 84.83)]. HGF positivity was associated with younger age, stage, higher grade, and high genomic risk of recurrence (ROR-PT) score. CONCLUSION HGF expression is a defining feature of basal-like tumors, and its association with black race and young women suggests it may be a candidate pathway for understanding breast cancer disparities.
Collapse
Affiliation(s)
- Gieira S Jones
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, USA
| | - Katherine A Hoadley
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Linnea T Olsson
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, USA
| | - Alina M Hamilton
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Arjun Bhattacharya
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Erin L Kirk
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, USA
| | - Heather J Tipaldos
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Jodie M Fleming
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Hazel B Nichols
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Andrew F Olshan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Melissa A Troester
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, 253 Rosenau Hall, CB #7435, 135 Dauer Drive, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
46
|
Liang D, Elwell AL, Aygün N, Krupa O, Wolter JM, Kyere FA, Lafferty MJ, Cheek KE, Courtney KP, Yusupova M, Garrett ME, Ashley-Koch A, Crawford GE, Love MI, de la Torre-Ubieta L, Geschwind DH, Stein JL. Cell-type-specific effects of genetic variation on chromatin accessibility during human neuronal differentiation. Nat Neurosci 2021; 24:941-953. [PMID: 34017130 PMCID: PMC8254789 DOI: 10.1038/s41593-021-00858-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 04/15/2021] [Indexed: 02/03/2023]
Abstract
Common genetic risk for neuropsychiatric disorders is enriched in regulatory elements active during cortical neurogenesis. However, it remains poorly understood as to how these variants influence gene regulation. To model the functional impact of common genetic variation on the noncoding genome during human cortical development, we performed the assay for transposase accessible chromatin using sequencing (ATAC-seq) and analyzed chromatin accessibility quantitative trait loci (QTL) in cultured human neural progenitor cells and their differentiated neuronal progeny from 87 donors. We identified significant genetic effects on 988/1,839 neuron/progenitor regulatory elements, with highly cell-type and temporally specific effects. A subset (roughly 30%) of chromatin accessibility-QTL were also associated with changes in gene expression. Motif-disrupting alleles of transcriptional activators generally led to decreases in chromatin accessibility, whereas motif-disrupting alleles of repressors led to increases in chromatin accessibility. By integrating cell-type-specific chromatin accessibility-QTL and brain-relevant genome-wide association data, we were able to fine-map and identify regulatory mechanisms underlying noncoding neuropsychiatric disorder risk loci.
Collapse
Affiliation(s)
- Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Angela L Elwell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Oleh Krupa
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Justin M Wolter
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Felix A Kyere
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Michael J Lafferty
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kerry E Cheek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kenan P Courtney
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Marianna Yusupova
- Neurogenetics Program, Department of Neurology, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Melanie E Garrett
- Duke Molecular Physiology Institute, Duke University, Durham, NC, USA
| | - Allison Ashley-Koch
- Duke Molecular Physiology Institute, Duke University, Durham, NC, USA
- Department of Medicine, Duke University, Durham, NC, USA
| | - Gregory E Crawford
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
- Department of Pediatrics, Division of Medical Genetics, Duke University, Durham, NC, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Luis de la Torre-Ubieta
- Neurogenetics Program, Department of Neurology, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Neurogenetics Program, Department of Neurology, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine University of California, Los Angeles, Los Angeles, CA, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
47
|
Hurson AN, Abubakar M, Conway-Dorsey K, Hoadley K, Love MI, Olshan AF, Garcia-Closas M, Troester MA. Abstract 876: TP53 pathway function, estrogen receptor status, and breast cancer risk factors in the Carolina Breast Cancer Study. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Purpose: TP53 and estrogen receptor (ER) both play essential roles in breast cancer development and progression, with recent research revealing crosstalk between TP53 and ER signaling pathways. While many studies have demonstrated heterogeneity of risk factor associations across ER subtypes, TP53 status has been inconsistently linked to breast cancer risk factors. This may be due to few studies evaluating TP53 effects in the context of ER. Additionally, studies have generally classified TP53 status using immunohistochemistry (IHC) staining or DNA sequencing, which are prone to misclassification of TP53 pathway function. RNA-based methods of measuring TP53 pathway activity may reduce misclassification and clarify etiologic associations.
Methods: This case-only analysis included 4,466 incident breast cancer cases from the Carolina Breast Cancer Study (1993-2013). Using RNA expression previously quantified using NanoString assays on FFPE tumor samples, tumors were classified for TP53 functional status (mutant-like or wildtype-like) using a validated 52-gene RNA signature. We used a two-stage polytomous logistic regression model to evaluate sources of risk factor heterogeneity due to RNA-based TP53 or ER, adjusting for each other and for PR, HER2, and grade, while accounting for missing tumor marker data. Risk factor heterogeneity was also evaluated for IHC-based TP53 and ER. For each risk factor, joint effects of TP53 and ER were evaluated by allowing for an interaction between the two markers.
Results: When classifying TP53 status using the RNA signature, the effects of several hormone-related factors (oral contraceptive use, menopausal status, age at menopause, and pre- and post-menopausal BMI) were heterogeneous across TP53 subtypes, while heterogeneity of reproductive factors (age at first birth and parity) and smoking status was observed across ER subtypes. The effect of number of births was heterogenous across both TP53 and ER-based subtype definitions. Additionally, we observed an interaction between RNA-based TP53 and ER status with family history of breast cancer in a first-degree relative (p=0.05). When classifying TP53 status using IHC, the TP53 effects were not recapitulated but there was minimal change in the ER effects. We observed an interaction between IHC-based TP53 and ER status with age at menarche (p=0.02), smoking status (p=0.05), and alcohol use (p=0.06).
Conclusions: This study demonstrates that TP53 and ER are valuable for defining etiologic subtypes of breast cancer, although IHC measures may not have as much value as RNA. Analyses of the joint effects of TP53 and ER with breast cancer risk factors revealed intriguing findings that could lead to new hypotheses. However, larger studies incorporating multiple correlated tumor characteristics will be required to confirm our findings to conclusively define etiologically relevant subtypes of breast cancer.
Citation Format: Amber N. Hurson, Mustapha Abubakar, Kathleen Conway-Dorsey, Katherine Hoadley, Michael I. Love, Andrew F. Olshan, Montserrat Garcia-Closas, Melissa A. Troester. TP53 pathway function, estrogen receptor status, and breast cancer risk factors in the Carolina Breast Cancer Study [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 876.
Collapse
|
48
|
Bhattacharya A, Hamilton AM, Furberg H, Pietzak E, Purdue MP, Troester MA, Hoadley KA, Love MI. An approach for normalization and quality control for NanoString RNA expression data. Brief Bioinform 2021; 22:bbaa163. [PMID: 32789507 PMCID: PMC8138885 DOI: 10.1093/bib/bbaa163] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 06/29/2020] [Accepted: 06/30/2020] [Indexed: 01/10/2023] Open
Abstract
The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.
Collapse
Affiliation(s)
| | | | | | | | - Mark P Purdue
- Division of Cancer Epidemiology and Genetics, National Cancer Institute
| | | | | | | |
Collapse
|
49
|
Bhattacharya A, Hamilton AM, Troester MA, Love MI. DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing. Nucleic Acids Res 2021; 49:e48. [PMID: 33524140 PMCID: PMC8096278 DOI: 10.1093/nar/gkab031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 12/21/2020] [Accepted: 01/12/2021] [Indexed: 12/13/2022] Open
Abstract
Targeted mRNA expression panels, measuring up to 800 genes, are used in academic and clinical settings due to low cost and high sensitivity for archived samples. Most samples assayed on targeted panels originate from bulk tissue comprised of many cell types, and cell-type heterogeneity confounds biological signals. Reference-free methods are used when cell-type-specific expression references are unavailable, but limited feature spaces render implementation challenging in targeted panels. Here, we present DeCompress, a semi-reference-free deconvolution method for targeted panels. DeCompress leverages a reference RNA-seq or microarray dataset from similar tissue to expand the feature space of targeted panels using compressed sensing. Ensemble reference-free deconvolution is performed on this artificially expanded dataset to estimate cell-type proportions and gene signatures. In simulated mixtures, four public cell line mixtures, and a targeted panel (1199 samples; 406 genes) from the Carolina Breast Cancer Study, DeCompress recapitulates cell-type proportions with less error than reference-free methods and finds biologically relevant compartments. We integrate compartment estimates into cis-eQTL mapping in breast cancer, identifying a tumor-specific cis-eQTL for CCR3 (C-C Motif Chemokine Receptor 3) at a risk locus. DeCompress improves upon reference-free methods without requiring expression profiles from pure cell populations, with applications in genomic analyses and clinical settings.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA 90095, USA
| | - Alina M Hamilton
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
| | - Melissa A Troester
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
- Department of Epidemiology, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
| |
Collapse
|
50
|
Rajesh A, Chang Y, Abedalthagafi MS, Wong-Beringer A, Love MI, Mangul S. Improving the completeness of public metadata accompanying omics studies. Genome Biol 2021; 22:106. [PMID: 33858487 PMCID: PMC8048353 DOI: 10.1186/s13059-021-02332-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 03/29/2021] [Indexed: 12/17/2022] Open
Affiliation(s)
- Anushka Rajesh
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | - Yutong Chang
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | - Malak S. Abedalthagafi
- Genomics Research Department, King Fahad Medical City and King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Annie Wong-Beringer
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089 USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514 USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089 USA
| |
Collapse
|