1
|
Jajodia A, Mishra A, Doni Jayavelu N, Lambert K, Moss N, Yang Z, Cerosaletti K, Buckner JH, Hawkins RD. Functional dissection of noncoding variants associated with rheumatoid arthritis. Ann Rheum Dis 2025:S0003-4967(25)00890-8. [PMID: 40318978 DOI: 10.1016/j.ard.2025.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 03/28/2025] [Accepted: 04/01/2025] [Indexed: 05/07/2025]
Abstract
OBJECTIVES Noncoding variants are critical to our understanding of the genetic basis of diseases and disorders such as rheumatoid arthritis (RA). While genome-wide association studies have identified regions of the genome associated with disease, functional studies are still lagging that can identify potentially causative variants. METHODS In order to functionally fine-map RA-associated variants, we identified variants at enhancers marked in primary activated T helper cells and conducted massively parallel reporter assay in these cells. RESULTS We found that combinations of functional variant genotypes are often exclusive to patients with RA. We leveraged 3-dimensional genome architecture and expression quantitative trait loci data to identify target genes of enhancers exhibiting allelic differences in activity. We confirmed enhancer activity and target gene interactions by Clustered Regularly Interpaced Short Palindromic Repeats Cas9 (CRISPR-Cas9) deletion in primary T cells. CONCLUSIONS The identification of functional enhancer variants suggests possible causal variants, and their target genes reveal known and novel genes as likely drivers of RA, as well as a means for therapeutic intervention.
Collapse
Affiliation(s)
- Ajay Jajodia
- Division of Medical Genetics, Department of Medicine, Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arpit Mishra
- Division of Medical Genetics, Department of Medicine, Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Naresh Doni Jayavelu
- Division of Medical Genetics, Department of Medicine, Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Nicholas Moss
- Division of Medical Genetics, Department of Medicine, Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Zongchen Yang
- Division of Medical Genetics, Department of Medicine, Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Jane H Buckner
- Benaroya Research Institute at Virginia Mason, Seattle, WA, USA
| | - R David Hawkins
- Division of Medical Genetics, Department of Medicine, Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Benaroya Research Institute at Virginia Mason, Seattle, WA, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington School of Medicine, Seattle, WA, USA.
| |
Collapse
|
2
|
Keukeleire P, Rosen JD, Göbel-Knapp A, Salomon K, Schubach M, Kircher M. Using individual barcodes to increase quantification power of massively parallel reporter assays. BMC Bioinformatics 2025; 26:52. [PMID: 39948460 PMCID: PMC11827149 DOI: 10.1186/s12859-025-06065-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 01/28/2025] [Indexed: 02/16/2025] Open
Abstract
BACKGROUND Massively parallel reporter assays (MPRAs) are an experimental technology for measuring the activity of thousands of candidate regulatory sequences or their variants in parallel, where the activity of individual sequences is measured from pools of sequence-tagged reporter genes. Activity is derived from the ratio of transcribed RNA to input DNA counts of associated tag sequences in each reporter construct, so-called barcodes. Recently, tools specifically designed to analyze MPRA data were developed that attempt to model the count data, accounting for its inherent variation. Of these tools, MPRAnalyze and mpralm are most widely used. MPRAnalyze models barcode counts to estimate the transcription rate of each sequence. While it has increased statistical power and robustness against outliers compared to mpralm, it is slow and has a high false discovery rate. Mpralm, a tool built on the R package Limma, estimates log fold-changes between different sequences. As opposed to MPRAnalyze, it is fast and has a low false discovery rate but is susceptible to outliers and has less statistical power. RESULTS We propose BCalm, an MPRA analysis framework aimed at addressing the limitations of the existing tools. BCalm is an adaptation of mpralm, but models individual barcode counts instead of aggregating counts per sequence. Leaving out the aggregation step increases statistical power and improves robustness to outliers, while being fast and precise. We show the improved performance over existing methods on both simulated MPRA data and a lentiviral MPRA library of 166,508 target sequences, including 82,258 allelic variants. Further, BCalm adds functionality beyond the existing mpralm package, such as preparing count input files from MPRAsnakeflow, as well as an option to test for sequences with enhancing or repressing activity. Its built-in plotting functionalities allow for easy interpretation of the results. CONCLUSIONS With BCalm, we provide a new tool for analyzing MPRA data which is robust and accurate on real MPRA datasets. The package is available at https://github.com/kircherlab/BCalm .
Collapse
Affiliation(s)
- Pia Keukeleire
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jonathan D Rosen
- Department of Genetics & Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Angelina Göbel-Knapp
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Kilian Salomon
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany.
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
3
|
Petersen RM, Vockley CM, Lea AJ. Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.23.609412. [PMID: 39229133 PMCID: PMC11370585 DOI: 10.1101/2024.08.23.609412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
A major goal in evolutionary biology and biomedicine is to understand the complex interactions between genetic variants, the epigenome, and gene expression. However, the causal relationships between these factors remain poorly understood. mSTARR-seq, a methylation-sensitive massively parallel reporter assay, is capable of identifying methylation-dependent regulatory activity at many thousands of genomic regions simultaneously, and allows for the testing of causal relationships between DNA methylation and gene expression on a region-by-region basis. Here, we developed a multiplexed mSTARR-seq protocol to assay naturally occurring human genetic variation from 25 individuals sampled from 10 localities in Europe and Africa. We identified 6,957 regulatory elements in either the unmethylated or methylated state, and this set was enriched for enhancer and promoter annotations, as expected. The expression of 58% of these regulatory elements was modulated by methylation, which was generally associated with decreased RNA expression. Within our set of regulatory elements, we used allele-specific expression analyses to identify 8,020 sites with genetic effects on gene regulation; further, we found that 42.3% of these genetic effects varied between methylated and unmethylated states. Sites exhibiting methylation-dependent genetic effects were enriched for GWAS and EWAS annotations, implicating them in human disease. Compared to datasets that assay DNA from a single European individual, our multiplexed assay uncovers dramatically more genetic effects and methylation-dependent genetic effects, highlighting the importance of including diverse individuals in assays which aim to understand gene regulatory processes.
Collapse
|
4
|
Mishra A, Jajodia A, Weston E, Jayavelu ND, Garcia M, Hossack D, Hawkins RD. Identification of functional enhancer variants associated with type I diabetes in CD4+ T cells. Front Immunol 2024; 15:1387253. [PMID: 38947339 PMCID: PMC11211866 DOI: 10.3389/fimmu.2024.1387253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 04/09/2024] [Indexed: 07/02/2024] Open
Abstract
Type I diabetes is an autoimmune disease mediated by T-cell destruction of β cells in pancreatic islets. Currently, there is no known cure, and treatment consists of daily insulin injections. Genome-wide association studies and twin studies have indicated a strong genetic heritability for type I diabetes and implicated several genes. As most strongly associated variants are noncoding, there is still a lack of identification of functional and, therefore, likely causal variants. Given that many of these genetic variants reside in enhancer elements, we have tested 121 CD4+ T-cell enhancer variants associated with T1D. We found four to be functional through massively parallel reporter assays. Three of the enhancer variants weaken activity, while the fourth strengthens activity. We link these to their cognate genes using 3D genome architecture or eQTL data and validate them using CRISPR editing. Validated target genes include CLEC16A and SOCS1. While these genes have been previously implicated in type 1 diabetes and other autoimmune diseases, we show that enhancers controlling their expression harbor functional variants. These variants, therefore, may act as causal type 1 diabetic variants.
Collapse
Affiliation(s)
- Arpit Mishra
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Ajay Jajodia
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Eryn Weston
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Naresh Doni Jayavelu
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Mariana Garcia
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - Daniel Hossack
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
| | - R. David Hawkins
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States
- Institute for Stem Cell and Regenerative Medicine, University of Washington School of Medicine, Seattle, WA, United States
- Benaroya Research Institute at Virginia Mason, Seattle, WA, United States
| |
Collapse
|
5
|
Stankey CT, Bourges C, Haag LM, Turner-Stokes T, Piedade AP, Palmer-Jones C, Papa I, Silva Dos Santos M, Zhang Q, Cameron AJ, Legrini A, Zhang T, Wood CS, New FN, Randzavola LO, Speidel L, Brown AC, Hall A, Saffioti F, Parkes EC, Edwards W, Direskeneli H, Grayson PC, Jiang L, Merkel PA, Saruhan-Direskeneli G, Sawalha AH, Tombetti E, Quaglia A, Thorburn D, Knight JC, Rochford AP, Murray CD, Divakar P, Green M, Nye E, MacRae JI, Jamieson NB, Skoglund P, Cader MZ, Wallace C, Thomas DC, Lee JC. A disease-associated gene desert directs macrophage inflammation through ETS2. Nature 2024; 630:447-456. [PMID: 38839969 PMCID: PMC11168933 DOI: 10.1038/s41586-024-07501-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/01/2024] [Indexed: 06/07/2024]
Abstract
Increasing rates of autoimmune and inflammatory disease present a burgeoning threat to human health1. This is compounded by the limited efficacy of available treatments1 and high failure rates during drug development2, highlighting an urgent need to better understand disease mechanisms. Here we show how functional genomics could address this challenge. By investigating an intergenic haplotype on chr21q22-which has been independently linked to inflammatory bowel disease, ankylosing spondylitis, primary sclerosing cholangitis and Takayasu's arteritis3-6-we identify that the causal gene, ETS2, is a central regulator of human inflammatory macrophages and delineate the shared disease mechanism that amplifies ETS2 expression. Genes regulated by ETS2 were prominently expressed in diseased tissues and more enriched for inflammatory bowel disease GWAS hits than most previously described pathways. Overexpressing ETS2 in resting macrophages reproduced the inflammatory state observed in chr21q22-associated diseases, with upregulation of multiple drug targets, including TNF and IL-23. Using a database of cellular signatures7, we identified drugs that might modulate this pathway and validated the potent anti-inflammatory activity of one class of small molecules in vitro and ex vivo. Together, this illustrates the power of functional genomics, applied directly in primary human cells, to identify immune-mediated disease mechanisms and potential therapeutic opportunities.
Collapse
Affiliation(s)
- C T Stankey
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
- Department of Immunology and Inflammation, Imperial College London, London, UK
- Washington University School of Medicine, St Louis, MO, USA
| | - C Bourges
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | - L M Haag
- Division of Gastroenterology, Infectious Diseases and Rheumatology, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - T Turner-Stokes
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
- Department of Immunology and Inflammation, Imperial College London, London, UK
| | - A P Piedade
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | - C Palmer-Jones
- Department of Gastroenterology, Royal Free Hospital, London, UK
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
| | - I Papa
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | | | - Q Zhang
- Genomics of Inflammation and Immunity Group, Human Genetics Programme, Wellcome Sanger Institute, Hinxton, UK
| | - A J Cameron
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - A Legrini
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - T Zhang
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - C S Wood
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - F N New
- NanoString Technologies, Seattle, WA, USA
| | - L O Randzavola
- Department of Immunology and Inflammation, Imperial College London, London, UK
| | - L Speidel
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
- Genetics Institute, University College London, London, UK
| | - A C Brown
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - A Hall
- The Sheila Sherlock Liver Centre, Royal Free Hospital, London, UK
- Department of Cellular Pathology, Royal Free Hospital, London, UK
| | - F Saffioti
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
- The Sheila Sherlock Liver Centre, Royal Free Hospital, London, UK
| | - E C Parkes
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | - W Edwards
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
| | - H Direskeneli
- Department of Internal Medicine, Division of Rheumatology, Marmara University, Istanbul, Turkey
| | - P C Grayson
- Systemic Autoimmunity Branch, NIAMS, National Institutes of Health, Bethesda, MD, USA
| | - L Jiang
- Department of Rheumatology, Zhongshan Hospital, Fudan University, Shanghai, China
| | - P A Merkel
- Division of Rheumatology, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Epidemiology, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - G Saruhan-Direskeneli
- Department of Physiology, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - A H Sawalha
- Division of Rheumatology, Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Lupus Center of Excellence, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA
| | - E Tombetti
- Department of Biomedical and Clinical Sciences, Milan University, Milan, Italy
- Internal Medicine and Rheumatology, ASST FBF-Sacco, Milan, Italy
| | - A Quaglia
- Department of Cellular Pathology, Royal Free Hospital, London, UK
- UCL Cancer Institute, London, UK
| | - D Thorburn
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
- The Sheila Sherlock Liver Centre, Royal Free Hospital, London, UK
| | - J C Knight
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Chinese Academy of Medical Sciences Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR Comprehensive Biomedical Research Centre, Oxford, UK
| | - A P Rochford
- Department of Gastroenterology, Royal Free Hospital, London, UK
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
| | - C D Murray
- Department of Gastroenterology, Royal Free Hospital, London, UK
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
| | - P Divakar
- NanoString Technologies, Seattle, WA, USA
| | - M Green
- Experimental Histopathology STP, The Francis Crick Institute, London, UK
| | - E Nye
- Experimental Histopathology STP, The Francis Crick Institute, London, UK
| | - J I MacRae
- Metabolomics STP, The Francis Crick Institute, London, UK
| | - N B Jamieson
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - P Skoglund
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
| | - M Z Cader
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - C Wallace
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
| | - D C Thomas
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - J C Lee
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK.
- Department of Gastroenterology, Royal Free Hospital, London, UK.
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK.
| |
Collapse
|
6
|
Aracena KA, Lin YL, Luo K, Pacis A, Gona S, Mu Z, Yotova V, Sindeaux R, Pramatarova A, Simon MM, Chen X, Groza C, Lougheed D, Gregoire R, Brownlee D, Boye C, Pique-Regi R, Li Y, He X, Bujold D, Pastinen T, Bourque G, Barreiro LB. Epigenetic variation impacts individual differences in the transcriptional response to influenza infection. Nat Genet 2024; 56:408-419. [PMID: 38424460 DOI: 10.1038/s41588-024-01668-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 01/16/2024] [Indexed: 03/02/2024]
Abstract
Humans display remarkable interindividual variation in their immune response to identical challenges. Yet, our understanding of the genetic and epigenetic factors contributing to such variation remains limited. Here we performed in-depth genetic, epigenetic and transcriptional profiling on primary macrophages derived from individuals of European and African ancestry before and after infection with influenza A virus. We show that baseline epigenetic profiles are strongly predictive of the transcriptional response to influenza A virus across individuals. Quantitative trait locus (QTL) mapping revealed highly coordinated genetic effects on gene regulation, with many cis-acting genetic variants impacting concomitantly gene expression and multiple epigenetic marks. These data reveal that ancestry-associated differences in the epigenetic landscape can be genetically controlled, even more than gene expression. Lastly, among QTL variants that colocalized with immune-disease loci, only 7% were gene expression QTL, while the remaining genetic variants impact epigenetic marks, stressing the importance of considering molecular phenotypes beyond gene expression in disease-focused studies.
Collapse
Affiliation(s)
| | - Yen-Lung Lin
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Alain Pacis
- Canadian Centre for Computational Genomics, McGill University, Montreal, Quebec, Canada
| | - Saideep Gona
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Zepeng Mu
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Vania Yotova
- Department of Genetics, CHU Sainte-Justine Research Center, Montreal, Quebec, Canada
| | - Renata Sindeaux
- Department of Genetics, CHU Sainte-Justine Research Center, Montreal, Quebec, Canada
| | | | | | - Xun Chen
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, Quebec, Canada
| | - David Lougheed
- Canadian Centre for Computational Genomics, McGill University, Montreal, Quebec, Canada
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
| | - Romain Gregoire
- Canadian Centre for Computational Genomics, McGill University, Montreal, Quebec, Canada
| | - David Brownlee
- Canadian Centre for Computational Genomics, McGill University, Montreal, Quebec, Canada
| | - Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA
| | - Yang Li
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - David Bujold
- Canadian Centre for Computational Genomics, McGill University, Montreal, Quebec, Canada
- McGill Genome Centre, Montreal, Quebec, Canada
| | - Tomi Pastinen
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- Genomic Medicine Center, Children's Mercy, Kansas City, MO, USA
| | - Guillaume Bourque
- Canadian Centre for Computational Genomics, McGill University, Montreal, Quebec, Canada.
- McGill Genome Centre, Montreal, Quebec, Canada.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan.
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
| | - Luis B Barreiro
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.
- Committee on Immunology, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
7
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
8
|
Bray D, Hook H, Zhao R, Keenan JL, Penvose A, Osayame Y, Mohaghegh N, Chen X, Parameswaran S, Kottyan LC, Weirauch MT, Siggers T. CASCADE: high-throughput characterization of regulatory complex binding altered by non-coding variants. CELL GENOMICS 2022; 2. [PMID: 35252945 PMCID: PMC8896503 DOI: 10.1016/j.xgen.2022.100098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Non-coding DNA variants (NCVs) impact gene expression by altering binding sites for regulatory complexes. New high-throughput methods are needed to characterize the impact of NCVs on regulatory complexes. We developed CASCADE (Customizable Approach to Survey Complex Assembly at DNA Elements), an array-based high-throughput method to profile cofactor (COF) recruitment. CASCADE identifies DNA-bound transcription factor-cofactor (TF-COF) complexes in nuclear extracts and quantifies the impact of NCVs on their binding. We demonstrate CASCADE sensitivity in characterizing condition-specific recruitment of COFs p300 and RBBP5 (MLL subunit) to the CXCL10 promoter in lipopolysaccharide (LPS)-stimulated human macrophages and quantify the impact of all possible NCVs. To demonstrate applicability to NCV screens, we profile TF-COF binding to ~1,700 single-nucleotide polymorphism quantitative trait loci (SNP-QTLs) in human macrophages and identify perturbed ETS domain-containing complexes. CASCADE will facilitate high-throughput testing of molecular mechanisms of NCVs for diverse biological applications. Bray et al. develop CASCADE, a method to profile transcription factor (TF)-cofactor (COF) complexes binding to DNA. They demonstrate the approach by profiling complex binding across the CXCL10 cytokine promoter and to ~1,700 single-nucleotide polymorphisms (SNPs). They anticipate that CASCADE can be applied to diverse biological systems to examine regulatory complex binding to DNA.
Collapse
Affiliation(s)
- David Bray
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Heather Hook
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Rose Zhao
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Jessica L. Keenan
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Ashley Penvose
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Yemi Osayame
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Nima Mohaghegh
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Xiaoting Chen
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Leah C. Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229, USA
| | - Matthew T. Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Trevor Siggers
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- Corresponding author
| |
Collapse
|
9
|
Hass MR, Brissette D, Parameswaran S, Pujato M, Donmez O, Kottyan LC, Weirauch MT, Kopan R. Runx1 shapes the chromatin landscape via a cascade of direct and indirect targets. PLoS Genet 2021; 17:e1009574. [PMID: 34111109 PMCID: PMC8219162 DOI: 10.1371/journal.pgen.1009574] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 06/22/2021] [Accepted: 05/03/2021] [Indexed: 11/18/2022] Open
Abstract
Runt-related transcription factor 1 (Runx1) can act as both an activator and a repressor. Here we show that CRISPR-mediated deletion of Runx1 in mouse metanephric mesenchyme-derived mK4 cells results in large-scale genome-wide changes to chromatin accessibility and gene expression. Open chromatin regions near down-regulated loci enriched for Runx sites in mK4 cells lose chromatin accessibility in Runx1 knockout cells, despite remaining Runx2-bound. Unexpectedly, regions near upregulated genes are depleted of Runx sites and are instead enriched for Zeb transcription factor binding sites. Re-expressing Zeb2 in Runx1 knockout cells restores suppression, and CRISPR mediated deletion of Zeb1 and Zeb2 phenocopies the gained expression and chromatin accessibility changes seen in Runx1KO due in part to subsequent activation of factors like Grhl2. These data confirm that Runx1 activity is uniquely needed to maintain open chromatin at many loci, and demonstrate that Zeb proteins are required and sufficient to maintain Runx1-dependent genome-scale repression. Runt-related transcription factor (Runx) 1 & 2 impact development and disease by activating or repressing transcription. In this manuscript we used genome editing tools to remove Runx1, and as expected, observed widespread changes in chromatin accessibility. Newly closed areas contained Runx1 binding sites and were enriched near genes whose expression depended on Runx1. Interestingly, this occurred despite continued binding of Runx2 to the same regions of DNA, which suggests that Runx2 is insufficient to maintain open chromatin and expression of Runx1 target genes in this cellular context. By contrast, newly opened chromatin regions, many near genes that were upregulated in Runx1 knockout cells, did not enrich for Runx1 binding sites. Instead, these regions were enriched for sites for the repressor Zeb proteins. We found that the loss of Zeb 1 & 2 expression, direct transcriptional targets of Runx1, resulted in the opening of chromatin and upregulation of genes residing near the newly open sites in Runx1 knockout cells. The same sites were also open and nearby genes expressed in edited Zeb1 and Zeb2 knockout cells. Among them were transcription factors, such as the Grhl2 gene, which in turn bind to and upregulate their target genes. Thus, the loss of a single transcription factor initiates a cascade of direct and indirect ramifications with likely negative effects on development and health.
Collapse
Affiliation(s)
- Matthew R. Hass
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Daniel Brissette
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Sreeja Parameswaran
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Mario Pujato
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Omer Donmez
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Leah C. Kottyan
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Matthew T. Weirauch
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- * E-mail: (MTW); (RK)
| | - Raphael Kopan
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- * E-mail: (MTW); (RK)
| |
Collapse
|
10
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
11
|
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease. Am J Hum Genet 2021; 108:411-430. [PMID: 33626337 DOI: 10.1016/j.ajhg.2021.02.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/04/2021] [Indexed: 02/08/2023] Open
Abstract
Genetic factors underlying coronary artery disease (CAD) have been widely studied using genome-wide association studies (GWASs). However, the functional understanding of the CAD loci has been limited by the fact that a majority of GWAS variants are located within non-coding regions with no functional role. High cholesterol and dysregulation of the liver metabolism such as non-alcoholic fatty liver disease confer an increased risk of CAD. Here, we studied the function of non-coding single-nucleotide polymorphisms in CAD GWAS loci located within liver-specific enhancer elements by identifying their potential target genes using liver cis-eQTL analysis and promoter Capture Hi-C in HepG2 cells. Altogether, 734 target genes were identified of which 121 exhibited correlations to liver-related traits. To identify potentially causal regulatory SNPs, the allele-specific enhancer activity was analyzed by (1) sequence-based computational predictions, (2) quantification of allele-specific transcription factor binding, and (3) STARR-seq massively parallel reporter assay. Altogether, our analysis identified 1,277 unique SNPs that display allele-specific regulatory activity. Among these, susceptibility enhancers near important cholesterol homeostasis genes (APOB, APOC1, APOE, and LIPA) were identified, suggesting that altered gene regulatory activity could represent another way by which genetic variation regulates serum lipoprotein levels. Using CRISPR-based perturbation, we demonstrate how the deletion/activation of a single enhancer leads to changes in the expression of many target genes located in a shared chromatin interaction domain. Our integrative genomics approach represents a comprehensive effort in identifying putative causal regulatory regions and target genes that could predispose to clinical manifestation of CAD by affecting liver function.
Collapse
|
12
|
Kreimer A, Yosef N. Evaluation of Davis et al.: Exploring Sequence of Determinants of Transcriptional Regulation-The Case of c-AMP Response Element. Cell Syst 2020; 11:2-4. [PMID: 32702318 DOI: 10.1016/j.cels.2020.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
One snapshot of the peer review process for "Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays" (Davis et al., 2020).
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA; Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University, Boston, MA, USA.
| |
Collapse
|
13
|
Qiao D, Zigler CM, Cho MH, Silverman EK, Zhou X, Castaldi PJ, Laird NH. Statistical considerations for the analysis of massively parallel reporter assays data. Genet Epidemiol 2020; 44:785-794. [PMID: 32681690 DOI: 10.1002/gepi.22337] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 06/12/2020] [Accepted: 07/03/2020] [Indexed: 01/23/2023]
Abstract
Noncoding DNA contains gene regulatory elements that alter gene expression, and the function of these elements can be modified by genetic variation. Massively parallel reporter assays (MPRA) enable high-throughput identification and characterization of functional genetic variants, but the statistical methods to identify allelic effects in MPRA data have not been fully developed. In this study, we demonstrate how the baseline allelic imbalance in MPRA libraries can produce biased results, and we propose a novel, nonparametric, adaptive testing method that is robust to this bias. We compare the performance of this method with other commonly used methods, and we demonstrate that our novel adaptive method controls Type I error in a wide range of scenarios while maintaining excellent power. We have implemented these tests along with routines for simulating MPRA data in the Analysis Toolset for MPRA (@MPRA), an R package for the design and analyses of MPRA experiments. It is publicly available at http://github.com/redaq/atMPRA.
Collapse
Affiliation(s)
- Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Corwin M Zigler
- Department of Statistics and Data Sciences, Department of Women's Health, University of Texas at Austin, Austin, Texas
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Xiaobo Zhou
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts.,Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Nan H Laird
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
| |
Collapse
|
14
|
Niroula A, Ajore R, Nilsson B. MPRAscore: robust and non-parametric analysis of massively parallel reporter assays. Bioinformatics 2020; 35:5351-5353. [PMID: 31359027 DOI: 10.1093/bioinformatics/btz591] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 07/17/2019] [Accepted: 07/24/2019] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Massively parallel reporter assays (MPRA) enable systematic screening of DNA sequence variants for effects on transcriptional activity. However, convenient analysis tools are still needed. RESULTS We introduce MPRAscore, a novel tool to infer allele-specific effects on transcription from MPRA data. MPRAscore uses a weighted, variance-regularized method to calculate variant effect sizes robustly, and a permutation approach to test for significance without assuming normality or independence. AVAILABILITY AND IMPLEMENTATION Source code (C++), precompiled binaries and data used in the paper at https://github.com/abhisheknrl/MPRAscore and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA554195. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Abhishek Niroula
- Department of Laboratory Medicine, Lund University, 221 84 Lund, Sweden.,Broad Institute, Cambridge, MA, USA
| | - Ram Ajore
- Department of Laboratory Medicine, Lund University, 221 84 Lund, Sweden
| | - Björn Nilsson
- Department of Laboratory Medicine, Lund University, 221 84 Lund, Sweden.,Broad Institute, Cambridge, MA, USA
| |
Collapse
|
15
|
Ghazi AR, Kong X, Chen ES, Edelstein LC, Shaw CA. Bayesian modelling of high-throughput sequencing assays with malacoda. PLoS Comput Biol 2020; 16:e1007504. [PMID: 32692749 PMCID: PMC7394446 DOI: 10.1371/journal.pcbi.1007504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 07/31/2020] [Accepted: 06/09/2020] [Indexed: 12/13/2022] Open
Abstract
NGS studies have uncovered an ever-growing catalog of human variation while leaving an enormous gap between observed variation and experimental characterization of variant function. High-throughput screens powered by NGS have greatly increased the rate of variant functionalization, but the development of comprehensive statistical methods to analyze screen data has lagged. In the massively parallel reporter assay (MPRA), short barcodes are counted by sequencing DNA libraries transfected into cells and the cell's output RNA in order to simultaneously measure the shifts in transcription induced by thousands of genetic variants. These counts present many statistical challenges, including overdispersion, depth dependence, and uncertain DNA concentrations. So far, the statistical methods used have been rudimentary, employing transformations on count level data and disregarding experimental and technical structure while failing to quantify uncertainty in the statistical model. We have developed an extensive framework for the analysis of NGS functionalization screens available as an R package called malacoda (available from github.com/andrewGhazi/malacoda). Our software implements a probabilistic, fully Bayesian model of screen data. The model uses the negative binomial distribution with gamma priors to model sequencing counts while accounting for effects from input library preparation and sequencing depth. The method leverages the high-throughput nature of the assay to estimate the priors empirically. External annotations such as ENCODE data or DeepSea predictions can also be incorporated to obtain more informative priors-a transformative capability for data integration. The package also includes quality control and utility functions, including automated barcode counting and visualization methods. To validate our method, we analyzed several datasets using malacoda and alternative MPRA analysis methods. These data include experiments from the literature, simulated assays, and primary MPRA data. We also used luciferase assays to experimentally validate several hits from our primary data, as well as variants for which the various methods disagree and variants detectable only with the aid of external annotations.
Collapse
Affiliation(s)
- Andrew R. Ghazi
- Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xianguo Kong
- Cardeza Foundation for Hematologic Research, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Ed S. Chen
- Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Leonard C. Edelstein
- Cardeza Foundation for Hematologic Research, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Chad A. Shaw
- Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
16
|
Bourges C, Groff AF, Burren OS, Gerhardinger C, Mattioli K, Hutchinson A, Hu T, Anand T, Epping MW, Wallace C, Smith KG, Rinn JL, Lee JC. Resolving mechanisms of immune-mediated disease in primary CD4 T cells. EMBO Mol Med 2020; 12:e12112. [PMID: 32239644 PMCID: PMC7207160 DOI: 10.15252/emmm.202012112] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open
Abstract
Deriving mechanisms of immune-mediated disease from GWAS data remains a formidable challenge, with attempts to identify causal variants being frequently hampered by strong linkage disequilibrium. To determine whether causal variants could be identified from their functional effects, we adapted a massively parallel reporter assay for use in primary CD4 T cells, the cell type whose regulatory DNA is most enriched for immune-mediated disease SNPs. This enabled the effects of candidate SNPs to be examined in a relevant cellular context and generated testable hypotheses into disease mechanisms. To illustrate the power of this approach, we investigated a locus that has been linked to six immune-mediated diseases but cannot be fine-mapped. By studying the lead expression-modulating SNP, we uncovered an NF-κB-driven regulatory circuit which constrains T-cell activation through the dynamic formation of a super-enhancer that upregulates TNFAIP3 (A20), a key NF-κB inhibitor. In activated T cells, this feedback circuit is disrupted-and super-enhancer formation prevented-by the risk variant at the lead SNP, leading to unrestrained T-cell activation via a molecular mechanism that appears to broadly predispose to human autoimmunity.
Collapse
Affiliation(s)
- Christophe Bourges
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - Abigail F Groff
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Oliver S Burren
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - Chiara Gerhardinger
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Kaia Mattioli
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Anna Hutchinson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
| | - Theodore Hu
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - Tanmay Anand
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - Madeline W Epping
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Kenneth Gc Smith
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - John L Rinn
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Department of Biochemistry, BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - James C Lee
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
17
|
Ashuach T, Fischer DS, Kreimer A, Ahituv N, Theis FJ, Yosef N. MPRAnalyze: statistical framework for massively parallel reporter assays. Genome Biol 2019; 20:183. [PMID: 31477158 PMCID: PMC6717970 DOI: 10.1186/s13059-019-1787-z] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 08/09/2019] [Indexed: 11/10/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) can measure the regulatory function of thousands of DNA sequences in a single experiment. Despite growing popularity, MPRA studies are limited by a lack of a unified framework for analyzing the resulting data. Here we present MPRAnalyze: a statistical framework for analyzing MPRA count data. Our model leverages the unique structure of MPRA data to quantify the function of regulatory sequences, compare sequences' activity across different conditions, and provide necessary flexibility in an evolving field. We demonstrate the accuracy and applicability of MPRAnalyze on simulated and published data and compare it with existing methods.
Collapse
Affiliation(s)
- Tal Ashuach
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, California USA
- Center for Computational Biology, University of California Berkeley, Berkeley, California USA
| | - David S. Fischer
- Institute of Computational Biology, Helmholz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Anat Kreimer
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, California USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California USA
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholz Zentrum München, Neuherberg, Germany
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, California USA
- Center for Computational Biology, University of California Berkeley, Berkeley, California USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA USA
- Chan Zuckerberg BioHub, San Francisco, California USA
| |
Collapse
|
18
|
Majoros WH, Kim YS, Barrera A, Li F, Wang X, Cunningham SJ, Johnson GD, Guo C, Lowe WL, Scholtens DM, Hayes MG, Reddy TE, Allen AS. Bayesian estimation of genetic regulatory effects in high-throughput reporter assays. Bioinformatics 2019; 36:331-338. [PMID: 31368479 PMCID: PMC7999138 DOI: 10.1093/bioinformatics/btz545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 06/12/2019] [Accepted: 07/24/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- William H Majoros
- Duke Center for Statistical Genetics and Genomics, Duke University,Division of Integrative Genomics, Department of Biostatistics and Bioinformatics, Duke University Medical School,Center for Genomic and Computational Biology, Duke University Medical School
| | - Young-Sook Kim
- Center for Genomic and Computational Biology, Duke University Medical School,Program in Computational Biology & Bioinformatics, Duke University, Durham, NC 27710
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University Medical School
| | - Fan Li
- Department of Biostatistics, Yale University, New Haven, CT 06520
| | - Xingyan Wang
- Present address: PhD Program in Biostatistics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033, USA
| | | | - Graham D Johnson
- Center for Genomic and Computational Biology, Duke University Medical School,Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710
| | - Cong Guo
- Present address: Human Genetics, GlaxoSmithKline, Collegeville, PA 19426, USA
| | - William L Lowe
- Division of Endocrinology Metabolism and Molecular Medicine, Northwestern University Feinberg School of Medicine, Chicago
| | - Denise M Scholtens
- Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - M Geoffrey Hayes
- Division of Endocrinology Metabolism and Molecular Medicine, Northwestern University Feinberg School of Medicine, Chicago
| | | | | |
Collapse
|
19
|
Movva R, Greenside P, Marinov GK, Nair S, Shrikumar A, Kundaje A. Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS One 2019; 14:e0218073. [PMID: 31206543 PMCID: PMC6576758 DOI: 10.1371/journal.pone.0218073] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 05/24/2019] [Indexed: 11/19/2022] Open
Abstract
The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.
Collapse
Affiliation(s)
- Rajiv Movva
- The Harker School, San Jose, CA, United States of America
- Department of Genetics, Stanford University, Stanford, CA, United States of America
| | - Peyton Greenside
- Biomedical Informatics Training Program, Stanford University, Stanford, CA, United States of America
| | - Georgi K. Marinov
- Department of Genetics, Stanford University, Stanford, CA, United States of America
| | - Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, United States of America
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
20
|
Kyono Y, Kitzman JO, Parker SCJ. Genomic annotation of disease-associated variants reveals shared functional contexts. Diabetologia 2019; 62:735-743. [PMID: 30756131 PMCID: PMC6451673 DOI: 10.1007/s00125-019-4823-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 11/27/2018] [Indexed: 01/22/2023]
Abstract
Variation in non-coding DNA, encompassing gene regulatory regions such as enhancers and promoters, contributes to risk for complex disorders, including type 2 diabetes. While genome-wide association studies have successfully identified hundreds of type 2 diabetes loci throughout the genome, the vast majority of these reside in non-coding DNA, which complicates the process of determining their functional significance and level of priority for further study. Here we review the methods used to experimentally annotate these non-coding variants, to nominate causal variants and to link them to diabetes pathophysiology. In recent years, chromatin profiling, massively parallel sequencing, high-throughput reporter assays and CRISPR gene editing technologies have rapidly become indispensable tools. Rather than treating individual variants in isolation, we discuss the importance of accounting for context, both genetic (such as flanking DNA sequence) and environmental (such as cellular state or environmental exposure). Incorporating these features shows promise in terms of revealing biologically convergent molecular signatures across distant and seemingly unrelated loci. Studying regulatory elements in the proper context will be crucial for interpreting the functional significance of disease-associated variants and applying the resulting knowledge to improve patient care.
Collapse
Affiliation(s)
- Yasuhiro Kyono
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, 2049 Palmer Commons Building, Ann Arbor, MI, 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Jacob O Kitzman
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, 2049 Palmer Commons Building, Ann Arbor, MI, 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Stephen C J Parker
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, 2049 Palmer Commons Building, Ann Arbor, MI, 48109, USA.
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
21
|
Myint L, Avramopoulos DG, Goff LA, Hansen KD. Linear models enable powerful differential activity analysis in massively parallel reporter assays. BMC Genomics 2019; 20:209. [PMID: 30866806 PMCID: PMC6417258 DOI: 10.1186/s12864-019-5556-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/22/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets. RESULTS We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. An R package is available from the Bioconductor project. CONCLUSIONS Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments.
Collapse
Affiliation(s)
- Leslie Myint
- Department of Mathematics, Statistics, and Computer Science, Macalester College, 1600 Grand Ave, Saint Paul, MN 55105 USA
| | | | - Loyal A. Goff
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, USA
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe St, E3527, Baltimore, MD 21212 USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
| |
Collapse
|
22
|
Wang X, He L, Goggin SM, Saadat A, Wang L, Sinnott-Armstrong N, Claussnitzer M, Kellis M. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat Commun 2018; 9:5380. [PMID: 30568279 PMCID: PMC6300699 DOI: 10.1038/s41467-018-07746-1] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 11/09/2018] [Indexed: 12/19/2022] Open
Abstract
Genome-wide epigenomic maps have revealed millions of putative enhancers and promoters, but experimental validation of their function and high-resolution dissection of their driver nucleotides remain limited. Here, we present HiDRA (High-resolution Dissection of Regulatory Activity), a combined experimental and computational method for high-resolution genome-wide testing and dissection of putative regulatory regions. We test ~7 million accessible DNA fragments in a single experiment, by coupling accessible chromatin extraction with self-transcribing episomal reporters (ATAC-STARR-seq). By design, fragments are highly overlapping in densely-sampled accessible regions, enabling us to pinpoint driver regulatory nucleotides by exploiting differences in activity between partially-overlapping fragments using a machine learning model (SHARPR-RE). In GM12878 lymphoblastoid cells, we find ~65,000 regions showing enhancer function, and pinpoint ~13,000 high-resolution driver elements. These are enriched for regulatory motifs, evolutionarily-conserved nucleotides, and disease-associated genetic variants from genome-wide association studies. Overall, HiDRA provides a high-throughput, high-resolution approach for dissecting regulatory regions and driver nucleotides.
Collapse
Affiliation(s)
- Xinchen Wang
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Institute for Genomic Medicine, Columbia University, New York, NY, 10024, USA
| | - Liang He
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sarah M Goggin
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alham Saadat
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Li Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | | | - Melina Claussnitzer
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Division of Gerontology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, 02215, USA.
- Institute of Nutritional Science, University of Hohenheim, Garbenstrasse 30, 70599, Stuttgart, Germany.
- Harvard Medical School, Harvard University, Boston, MA, 02215, USA.
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
23
|
Kalita CA, Brown CD, Freiman A, Isherwood J, Wen X, Pique-Regi R, Luca F. High-throughput characterization of genetic effects on DNA-protein binding and gene transcription. Genome Res 2018; 28:1701-1708. [PMID: 30254052 PMCID: PMC6211638 DOI: 10.1101/gr.237354.118] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 09/20/2018] [Indexed: 12/29/2022]
Abstract
Many variants associated with complex traits are in noncoding regions and contribute to phenotypes by disrupting regulatory sequences. To characterize these variants, we developed a streamlined protocol for a high-throughput reporter assay, Biallelic Targeted STARR-seq (BiT-STARR-seq), that identifies allele-specific expression (ASE) while accounting for PCR duplicates through unique molecular identifiers. We tested 75,501 oligos (43,500 SNPs) and identified 2720 SNPs with significant ASE (FDR < 10%). To validate disruption of binding as one of the mechanisms underlying ASE, we developed a new high-throughput allele-specific binding assay for NFKB1. We identified 2684 SNPs with allele-specific binding (ASB) (FDR < 10%); 256 of these SNPs also had ASE (OR = 1.97, P-value = 0.0006). Of variants associated with complex traits, 1531 resulted in ASE, and 1662 showed ASB. For example, we characterized that the Crohn's disease risk variant for rs3810936 increases NFKB1 binding and results in altered gene expression.
Collapse
Affiliation(s)
- Cynthia A Kalita
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48202, USA
| | - Christopher D Brown
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Andrew Freiman
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48202, USA
| | - Jenna Isherwood
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48202, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48202, USA.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48202, USA
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48202, USA.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48202, USA
| |
Collapse
|