1
|
Keukeleire P, Rosen JD, Göbel-Knapp A, Salomon K, Schubach M, Kircher M. Using individual barcodes to increase quantification power of massively parallel reporter assays. BMC Bioinformatics 2025; 26:52. [PMID: 39948460 PMCID: PMC11827149 DOI: 10.1186/s12859-025-06065-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 01/28/2025] [Indexed: 02/16/2025] Open
Abstract
BACKGROUND Massively parallel reporter assays (MPRAs) are an experimental technology for measuring the activity of thousands of candidate regulatory sequences or their variants in parallel, where the activity of individual sequences is measured from pools of sequence-tagged reporter genes. Activity is derived from the ratio of transcribed RNA to input DNA counts of associated tag sequences in each reporter construct, so-called barcodes. Recently, tools specifically designed to analyze MPRA data were developed that attempt to model the count data, accounting for its inherent variation. Of these tools, MPRAnalyze and mpralm are most widely used. MPRAnalyze models barcode counts to estimate the transcription rate of each sequence. While it has increased statistical power and robustness against outliers compared to mpralm, it is slow and has a high false discovery rate. Mpralm, a tool built on the R package Limma, estimates log fold-changes between different sequences. As opposed to MPRAnalyze, it is fast and has a low false discovery rate but is susceptible to outliers and has less statistical power. RESULTS We propose BCalm, an MPRA analysis framework aimed at addressing the limitations of the existing tools. BCalm is an adaptation of mpralm, but models individual barcode counts instead of aggregating counts per sequence. Leaving out the aggregation step increases statistical power and improves robustness to outliers, while being fast and precise. We show the improved performance over existing methods on both simulated MPRA data and a lentiviral MPRA library of 166,508 target sequences, including 82,258 allelic variants. Further, BCalm adds functionality beyond the existing mpralm package, such as preparing count input files from MPRAsnakeflow, as well as an option to test for sequences with enhancing or repressing activity. Its built-in plotting functionalities allow for easy interpretation of the results. CONCLUSIONS With BCalm, we provide a new tool for analyzing MPRA data which is robust and accurate on real MPRA datasets. The package is available at https://github.com/kircherlab/BCalm .
Collapse
Affiliation(s)
- Pia Keukeleire
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jonathan D Rosen
- Department of Genetics & Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Angelina Göbel-Knapp
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Kilian Salomon
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany.
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
2
|
Went M, Duran-Lozano L, Halldorsson GH, Gunnell A, Ugidos-Damboriena N, Law P, Ekdahl L, Sud A, Thorleifsson G, Thodberg M, Olafsdottir T, Lamarca-Arrizabalaga A, Cafaro C, Niroula A, Ajore R, Lopez de Lapuente Portilla A, Ali Z, Pertesi M, Goldschmidt H, Stefansdottir L, Kristinsson SY, Stacey SN, Love TJ, Rognvaldsson S, Hajek R, Vodicka P, Pettersson-Kymmer U, Späth F, Schinke C, Van Rhee F, Sulem P, Ferkingstad E, Hjorleifsson Eldjarn G, Mellqvist UH, Jonsdottir I, Morgan G, Sonneveld P, Waage A, Weinhold N, Thomsen H, Försti A, Hansson M, Juul-Vangsted A, Thorsteinsdottir U, Hemminki K, Kaiser M, Rafnar T, Stefansson K, Houlston R, Nilsson B. Deciphering the genetics and mechanisms of predisposition to multiple myeloma. Nat Commun 2024; 15:6644. [PMID: 39103364 PMCID: PMC11300596 DOI: 10.1038/s41467-024-50932-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 07/24/2024] [Indexed: 08/07/2024] Open
Abstract
Multiple myeloma (MM) is an incurable malignancy of plasma cells. Epidemiological studies indicate a substantial heritable component, but the underlying mechanisms remain unclear. Here, in a genome-wide association study totaling 10,906 cases and 366,221 controls, we identify 35 MM risk loci, 12 of which are novel. Through functional fine-mapping and Mendelian randomization, we uncover two causal mechanisms for inherited MM risk: longer telomeres; and elevated levels of B-cell maturation antigen (BCMA) and interleukin-5 receptor alpha (IL5RA) in plasma. The largest increase in BCMA and IL5RA levels is mediated by the risk variant rs34562254-A at TNFRSF13B. While individuals with loss-of-function variants in TNFRSF13B develop B-cell immunodeficiency, rs34562254-A exerts a gain-of-function effect, increasing MM risk through amplified B-cell responses. Our results represent an analysis of genetic MM predisposition, highlighting causal mechanisms contributing to MM development.
Collapse
Affiliation(s)
- Molly Went
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Laura Duran-Lozano
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | | | - Andrea Gunnell
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Nerea Ugidos-Damboriena
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Philip Law
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Ludvig Ekdahl
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Amit Sud
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | | | - Malte Thodberg
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | | | - Antton Lamarca-Arrizabalaga
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Caterina Cafaro
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Abhishek Niroula
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Ram Ajore
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Aitzkoa Lopez de Lapuente Portilla
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Zain Ali
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Maroulio Pertesi
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden
| | - Hartmut Goldschmidt
- Department of Internal Medicine V, University of Heidelberg, 69120, Heidelberg, Germany
| | | | - Sigurdur Y Kristinsson
- Landspitali, National University Hospital of Iceland, IS-101, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, IS-101, Reykjavik, Iceland
| | - Simon N Stacey
- deCODE Genetics/Amgen, Sturlugata 8, IS-101, Reykjavik, Iceland
| | - Thorvardur J Love
- Landspitali, National University Hospital of Iceland, IS-101, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, IS-101, Reykjavik, Iceland
| | - Saemundur Rognvaldsson
- Landspitali, National University Hospital of Iceland, IS-101, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, IS-101, Reykjavik, Iceland
| | - Roman Hajek
- University Hospital Ostrava and University of Ostrava, Ostrava, Czech Republic
| | - Pavel Vodicka
- Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | | | - Florentin Späth
- Department of Radiation Sciences, Umeå University, SE-901 87, Umeå, Sweden
| | - Carolina Schinke
- Myeloma Center, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Frits Van Rhee
- Myeloma Center, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Patrick Sulem
- deCODE Genetics/Amgen, Sturlugata 8, IS-101, Reykjavik, Iceland
| | | | | | | | | | - Gareth Morgan
- Perlmutter Cancer Center, Langone Health, New York University, New York, NY, USA
| | - Pieter Sonneveld
- Department of Hematology, Erasmus MC Cancer Institute, 3075 EA, Rotterdam, The Netherlands
| | - Anders Waage
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Box 8905, N-7491, Trondheim, Norway
| | - Niels Weinhold
- Department of Internal Medicine V, University of Heidelberg, 69120, Heidelberg, Germany
- German Cancer Research Center (DKFZ), D-69120, Heidelberg, Germany
| | | | - Asta Försti
- German Cancer Research Center (DKFZ), D-69120, Heidelberg, Germany
- Hopp Children's Cancer Center, Heidelberg, Germany
| | - Markus Hansson
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden
- Section of Hematology, Sahlgrenska University Hospital, Gothenburg, SE-413 45, Sweden
- Skåne University Hospital, SE-221 85, Lund, Sweden
| | - Annette Juul-Vangsted
- Department of Haematology, University Hospital of Copenhagen at Rigshospitalet, Blegdamsvej 9, DK-2100, Copenhagen, Denmark
| | - Unnur Thorsteinsdottir
- deCODE Genetics/Amgen, Sturlugata 8, IS-101, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, IS-101, Reykjavik, Iceland
| | - Kari Hemminki
- German Cancer Research Center (DKFZ), D-69120, Heidelberg, Germany
- Faculty of Medicine in Pilsen, Charles University, 30605, Pilsen, Czech Republic
| | - Martin Kaiser
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Thorunn Rafnar
- deCODE Genetics/Amgen, Sturlugata 8, IS-101, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE Genetics/Amgen, Sturlugata 8, IS-101, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, IS-101, Reykjavik, Iceland
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK.
| | - Björn Nilsson
- Department of Laboratory Medicine, Lund University, SE-221 84, Lund, Sweden.
- Lund Stem Cell Center, Lund University, SE-221 84, Lund, Sweden.
- Broad Institute, 415 Main Street, Cambridge, MA, 02142, USA.
| |
Collapse
|
3
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
4
|
Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol 2022; 23:98. [PMID: 35428271 PMCID: PMC9011994 DOI: 10.1186/s13059-022-02661-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 03/21/2022] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Collapse
Affiliation(s)
- Ammar Tareen
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
- Present Address: Regeneron Pharmaceuticals, Inc., Tarrytown, 10591, NY, USA
| | - Mahdi Kooshkbaghi
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - William T Ireland
- Department of Physics, California Institute of Technology, Pasadena, 91125, CA, USA
- Present Address: Department of Applied Physics, Harvard University, Cambridge, 02134, MA, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, 11724, NY, USA.
| |
Collapse
|
5
|
Ajore R, Niroula A, Pertesi M, Cafaro C, Thodberg M, Went M, Bao EL, Duran-Lozano L, Lopez de Lapuente Portilla A, Olafsdottir T, Ugidos-Damboriena N, Magnusson O, Samur M, Lareau CA, Halldorsson GH, Thorleifsson G, Norddahl GL, Gunnarsdottir K, Försti A, Goldschmidt H, Hemminki K, van Rhee F, Kimber S, Sperling AS, Kaiser M, Anderson K, Jonsdottir I, Munshi N, Rafnar T, Waage A, Weinhold N, Thorsteinsdottir U, Sankaran VG, Stefansson K, Houlston R, Nilsson B. Functional dissection of inherited non-coding variation influencing multiple myeloma risk. Nat Commun 2022; 13:151. [PMID: 35013207 PMCID: PMC8748989 DOI: 10.1038/s41467-021-27666-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/02/2021] [Indexed: 12/16/2022] Open
Abstract
Thousands of non-coding variants have been associated with increased risk of human diseases, yet the causal variants and their mechanisms-of-action remain obscure. In an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, PCHiC) and chromatin accessibility analyses in primary cells (caQTL), we investigate 1,039 variants associated with multiple myeloma (MM). We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci (SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1). Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise genomic positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.
Collapse
Affiliation(s)
- Ram Ajore
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Abhishek Niroula
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
| | - Maroulio Pertesi
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Caterina Cafaro
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Malte Thodberg
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Molly Went
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Erik L Bao
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Laura Duran-Lozano
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | | | | | - Nerea Ugidos-Damboriena
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Olafur Magnusson
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Mehmet Samur
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Caleb A Lareau
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | | | | | | | - Asta Försti
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Hopp Children's Cancer Center, Heidelberg, Germany
| | - Hartmut Goldschmidt
- Department of Internal Medicine V, University Hospital of Heidelberg, 69120, Heidelberg, Germany
| | - Kari Hemminki
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, Prague, 30605, Czech Republic
| | | | - Scott Kimber
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Adam S Sperling
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Martin Kaiser
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Kenneth Anderson
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | - Nikhil Munshi
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Thorunn Rafnar
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Anders Waage
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Box 8905, N-7491, Trondheim, Norway
| | - Niels Weinhold
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Department of Internal Medicine V, University Hospital of Heidelberg, 69120, Heidelberg, Germany
| | | | - Vijay G Sankaran
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Björn Nilsson
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden.
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA.
| |
Collapse
|
6
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
7
|
Ghazi AR, Kong X, Chen ES, Edelstein LC, Shaw CA. Bayesian modelling of high-throughput sequencing assays with malacoda. PLoS Comput Biol 2020; 16:e1007504. [PMID: 32692749 PMCID: PMC7394446 DOI: 10.1371/journal.pcbi.1007504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 07/31/2020] [Accepted: 06/09/2020] [Indexed: 12/13/2022] Open
Abstract
NGS studies have uncovered an ever-growing catalog of human variation while leaving an enormous gap between observed variation and experimental characterization of variant function. High-throughput screens powered by NGS have greatly increased the rate of variant functionalization, but the development of comprehensive statistical methods to analyze screen data has lagged. In the massively parallel reporter assay (MPRA), short barcodes are counted by sequencing DNA libraries transfected into cells and the cell's output RNA in order to simultaneously measure the shifts in transcription induced by thousands of genetic variants. These counts present many statistical challenges, including overdispersion, depth dependence, and uncertain DNA concentrations. So far, the statistical methods used have been rudimentary, employing transformations on count level data and disregarding experimental and technical structure while failing to quantify uncertainty in the statistical model. We have developed an extensive framework for the analysis of NGS functionalization screens available as an R package called malacoda (available from github.com/andrewGhazi/malacoda). Our software implements a probabilistic, fully Bayesian model of screen data. The model uses the negative binomial distribution with gamma priors to model sequencing counts while accounting for effects from input library preparation and sequencing depth. The method leverages the high-throughput nature of the assay to estimate the priors empirically. External annotations such as ENCODE data or DeepSea predictions can also be incorporated to obtain more informative priors-a transformative capability for data integration. The package also includes quality control and utility functions, including automated barcode counting and visualization methods. To validate our method, we analyzed several datasets using malacoda and alternative MPRA analysis methods. These data include experiments from the literature, simulated assays, and primary MPRA data. We also used luciferase assays to experimentally validate several hits from our primary data, as well as variants for which the various methods disagree and variants detectable only with the aid of external annotations.
Collapse
Affiliation(s)
- Andrew R. Ghazi
- Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xianguo Kong
- Cardeza Foundation for Hematologic Research, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Ed S. Chen
- Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Leonard C. Edelstein
- Cardeza Foundation for Hematologic Research, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Chad A. Shaw
- Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|