1
|
Park D, Cenik C. Long-read RNA sequencing reveals allele-specific N 6-methyladenosine modifications. Genome Res 2025; 35:999-1011. [PMID: 39472020 PMCID: PMC12047277 DOI: 10.1101/gr.279270.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 10/23/2024] [Indexed: 11/06/2024]
Abstract
Long-read sequencing technology enables highly accurate detection of allele-specific RNA expression, providing insights into the effects of genetic variation on splicing and RNA abundance. Furthermore, the ability to directly sequence RNA enables the detection of RNA modifications in tandem with ascertaining the allelic origin of each molecule. Here, we leverage these advantages to determine allele-biased patterns of N 6-methyladenosine (m6A) modifications in native mRNA. We used human and mouse cells with known genetic variants to assign the allelic origin of each mRNA molecule combined with a supervised machine learning model to detect read-level m6A modification ratios. Our analyses reveal the importance of sequences adjacent to the DRACH motif in determining m6A deposition, in addition to allelic differences that directly alter the motif. Moreover, we discover allele-specific m6A modification events with no genetic variants in close proximity to the differentially modified nucleotide, demonstrating the unique advantage of using long-reads and surpassing the capabilities of antibody-based short-read approaches. This technological advance will further our understanding of the role of genetics in determining mRNA modifications.
Collapse
Affiliation(s)
- Dayea Park
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
2
|
Seoighe C, Connaire S, Chopra M. Probing the limits of cis-acting gene regulation using a model of allelic imbalance quantitative trait loci. PLoS Genet 2025; 21:e1011446. [PMID: 40305626 PMCID: PMC12068699 DOI: 10.1371/journal.pgen.1011446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 05/12/2025] [Accepted: 03/27/2025] [Indexed: 05/02/2025] Open
Abstract
Imbalance in gene expression between alleles is a hallmark of cis-acting expression quantitative trait loci (eQTLs) and several methods have been developed to exploit allelic imbalance to support the identification of eQTLs. Allelic imbalance is also of scientific and, potentially, clinical interest as it can erode the degree to which the effects of deleterious variants are buffered in a diploid organism and has been reported to be associated with the penetrance of pathological genomic variants. Here, we develop and apply a statistical model that is designed to evaluate whether the genotype of a locus is associated with the degree of allelic imbalance of a gene and refer to such loci as allelic imbalance quantitative trait loci (aiQTLs). An advantage of our approach is that it does not depend on linkage disequilibrium between the aiQTL and the associated gene and is, therefore, suited to the identification of eQTLs that act in cis over very large distances. We applied our model to data from the GTEx consortium and examined the relationship between the distance of an eQTL from the TSS of the associated gene and the evidence that the eQTL acts in cis. Previous studies have used a distance of 1Mb from the target gene as an indication that an eQTL acts in cis; however, our results suggest that the majority of eQTLs at distances more than 500 kb from the TSS of the target gene are likely to act in trans (and thus to affect both gene copies). The model used here is also well suited to comparing the overall extent of allelic imbalance between samples. We show that in some tissues allelic imbalance is correlated with age; however, this correlation may be due to changes in the abundance of immune cell populations with age, as we found strong correlations between sample-level allelic imbalance and the inferred abundance of multiple immune cell types across whole blood samples.
Collapse
Affiliation(s)
- Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
- Research Ireland Centre for Research Training in Genomics Data Science, University of Galway, Galway, Ireland
| | - Seán Connaire
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Mehak Chopra
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
- Research Ireland Centre for Research Training in Genomics Data Science, University of Galway, Galway, Ireland
| |
Collapse
|
3
|
Neylan CJ, Levin MG, Hartmann K, Beigel K, Khodursky S, DePaolo JS, Abramowitz S, Furth EE, Heuckeroth RO, Damrauer SM, Maguire LH. Genome-wide association meta-analysis identifies 126 novel loci for diverticular disease and implicates connective tissue and colonic motility. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.27.25324777. [PMID: 40196262 PMCID: PMC11974943 DOI: 10.1101/2025.03.27.25324777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Diverticular disease is a common and morbid complex phenotype influenced by both innate and environmental risk factors. We performed the largest genome-wide association study meta-analysis for diverticular disease, identifying 126 novel loci. Employing multiple downstream analytic strategies, including tissue and pathway enrichment, statistical fine-mapping, allele-specific expression, protein quantitative trait loci and drug-target investigations, and linkage disequilibrium score regression, we prioritized causal genes and produced several lines of evidence linking diverticular disease to connective tissue biology and colonic motility. We substantiated these findings by integrating single-cell RNA sequencing data, showing that prioritized diverticular disease-associated genes are enriched for expression in colonic smooth muscle, fibroblasts, and interstitial cells of Cajal. In quantitative analysis of surgical specimens, we found a substantial reduction in the density of elastin present in the sigmoid colon in severe diverticulitis.
Collapse
Affiliation(s)
- Christopher J. Neylan
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Michael G. Levin
- Cardiovascular Institute, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Corporal Michael Crescenz VA Medical Center, Philadelphia, PA 19104
| | - Katherine Hartmann
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Katherine Beigel
- Department of Biomedical and Health Informatics (DBHi), Children’s Hospital of Philadelphia, Philadelphia, PA 19104
| | - Sam Khodursky
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - John S. DePaolo
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Sarah Abramowitz
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY 11549
| | - Emma E. Furth
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104
| | - Robert O. Heuckeroth
- Division of Gastroenterology, Hepatology and Nutrition, Children’s Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA 19104
- The Children’s Hospital of Philadelphia Research Institute and Abramson Research Center, 3615 Civic Center Blvd, Philadelphia, PA 19104, USA
- Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Boulevard, Philadelphia, PA 19104
| | - Scott M. Damrauer
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Cardiovascular Institute, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
- Corporal Michael Crescenz VA Medical Center, Philadelphia, PA 19104
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Lillias H. Maguire
- Corporal Michael Crescenz VA Medical Center, Philadelphia, PA 19104
- Division of Colon and Rectal Surgery, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
4
|
Buyan A, Meshcheryakov G, Safronov V, Abramov S, Boytsov A, Nozdrin V, Baulin EF, Kolmykov S, Vierstra J, Kolpakov F, Makeev VJ, Kulakovskiy IV. Statistical framework for calling allelic imbalance in high-throughput sequencing data. Nat Commun 2025; 16:1739. [PMID: 39966391 PMCID: PMC11836314 DOI: 10.1038/s41467-024-55513-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 12/16/2024] [Indexed: 02/20/2025] Open
Abstract
High-throughput sequencing facilitates large-scale studies of gene regulation and allows tracing the associations of individual genomic variants with changes in gene regulation and expression. Compared to classic association studies, the assessment of an allelic imbalance at heterozygous variants captures functional variant effects with smaller sample sizes, higher sensitivity, and better resolution. Yet, identification of allele-specific variants from allelic read counts remains challenging due to data-dependent biases and overdispersion arising from technical and biological variability. We present MIXALIME, a novel computational framework for calling allele-specific variants in diverse omics data with a repertoire of statistical models accounting for read mapping bias and copy number variation. We benchmark MIXALIME with DNase-Seq, ATAC-Seq, and CAGE-Seq data, and we demonstrate that the allelic imbalance highlights causal variants in GWAS results. Finally, as a showcase of the large-scale practical application of MIXALIME, we present an atlas of variants exhibiting allele-specific chromatin accessibility, built from thousands of available datasets obtained from diverse cell types.
Collapse
Affiliation(s)
- Andrey Buyan
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia
| | | | - Viacheslav Safronov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Moscow Center for Advanced Studies, Moscow, Russia
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Moscow Center for Advanced Studies, Moscow, Russia
| | - Vladimir Nozdrin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Eugene F Baulin
- Moscow Center for Advanced Studies, Moscow, Russia
- International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, Sirius, Krasnodar region, Russia
| | - Jeff Vierstra
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Fedor Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
- Moscow Center for Advanced Studies, Moscow, Russia.
- Institute of Biochemistry and Genetics, Ufa Federal Research Centre of the Russian Academy of Sciences, Ufa, Russia.
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK.
| | - Ivan V Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia.
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
| |
Collapse
|
5
|
Niharika, Asthana S, Narayan Yadav H, Sharma N, Kumar Singh V. A compendium of methods: Searching allele specific expression via RNA sequencing. Gene 2025; 936:149102. [PMID: 39561903 DOI: 10.1016/j.gene.2024.149102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 11/04/2024] [Accepted: 11/14/2024] [Indexed: 11/21/2024]
Abstract
Diploid mammalian genome has paired alleles for each gene; typically allowing for equal expression of the two alleles within the cell/tissue. However, genetic regulatory elements and epigenetic modifications can disrupt this equality, leading to preferential expression of one allele. Examining high-confidence allele-specific expression (ASE) is vital for understanding genetic variations and their impact on major diseases like cancers and diabetes. ASE analysis not only aids in disease prognosis and diagnosis but also helps to identify regulatory mechanisms operating within the genome. While advances in sequencing technologies have greatly improved our understanding of ASE, challenges remain in estimating it accurately. In this article, we reviewed methods for detecting ASE using both bulk RNASeq and single-cell RNASeq data to provide deeper insights beyond the mere prediction of ASE genes. Fundamentally, ASE detection methods are data-driven and can be classified according to type of data used. Some methods utilize both, DNA genotyping information and RNASeq while others rely solely on RNASeq data. This article offers a comparative analysis of these methods and compilation of repositories providing valuable insights.
Collapse
Affiliation(s)
- Niharika
- Department of Bioinformatics, Central University of South Bihar, Gaya, Bihar 824236, India
| | - Shailendra Asthana
- Computational and Mathematical Biology Centre, Translational Health Science and Technology Institute, NCR Biotech Science Cluster 3rd 15 Milestone, Faridabad-Gurugram 16 expressway, PO Box # 4. Faridabad, Haryana 121001, India
| | - Harlokesh Narayan Yadav
- Department of Pharmacology, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Nanaocha Sharma
- Institute of Bioresources and Sustainable Development, Takyelpat, Manipur 795001 Imphal, India.
| | - Vijay Kumar Singh
- Department of Bioinformatics, Central University of South Bihar, Gaya, Bihar 824236, India.
| |
Collapse
|
6
|
Heath HD, Peng S, Szmatola T, Ryan S, Bellone RR, Kalbfleisch T, Petersen JL, Finno CJ. A comprehensive allele specific expression resource for the equine transcriptome. BMC Genomics 2025; 26:88. [PMID: 39885415 PMCID: PMC11780778 DOI: 10.1186/s12864-025-11240-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 01/13/2025] [Indexed: 02/01/2025] Open
Abstract
BACKGROUND Allele-specific expression (ASE) analysis provides a nuanced view of cis-regulatory mechanisms affecting gene expression. RESULTS An equine ASE analysis was performed, using integrated Iso-seq and short-read RNA sequencing data from four healthy Thoroughbreds (2 mares and 2 stallions) across 9 tissues from the Functional Annotation of Animal Genomes (FAANG) project. Allele expression was quantified by haplotypes from long-read data, with 42,900 allele expression events compared. Within these events, 635 (1.48%) demonstrated ASE, with liver tissue containing the highest proportion. Genetic variants within ASE events were located in histone modified regions 64.2% of the time. Validation of allele-specific variants, using a set of 66 equine liver samples from multiple breeds, confirmed that 97% of variants demonstrated ASE. CONCLUSIONS This valuable publicly accessible resource is poised to facilitate investigations into regulatory variation in equine tissues. Our results highlight the tissue-specific nature of allelic imbalance in the equine genome.
Collapse
Affiliation(s)
- Harrison D Heath
- Department of Population Health and Reproduction, Davis School of Veterinary Medicine, University of California, Room 4206 Vet Med3A One Shields Ave, Davis, CA, 95616, USA
| | - Sichong Peng
- Department of Population Health and Reproduction, Davis School of Veterinary Medicine, University of California, Room 4206 Vet Med3A One Shields Ave, Davis, CA, 95616, USA
- Present address: Eclipsebio, San Diego, CA, 92121, USA
| | - Tomasz Szmatola
- Department of Population Health and Reproduction, Davis School of Veterinary Medicine, University of California, Room 4206 Vet Med3A One Shields Ave, Davis, CA, 95616, USA
- Centre of Experimental and Innovative Medicine, University of Agriculture in Kraków, Al. Mickiewicza 24/28, 30-059, Kraków, Poland
| | - Stephanie Ryan
- Department of Population Health and Reproduction, Davis School of Veterinary Medicine, University of California, Room 4206 Vet Med3A One Shields Ave, Davis, CA, 95616, USA
| | - Rebecca R Bellone
- Department of Population Health and Reproduction, Davis School of Veterinary Medicine, University of California, Room 4206 Vet Med3A One Shields Ave, Davis, CA, 95616, USA
- Veterinary Genetics Laboratory, University of California, Davis School of Veterinary Medicine, Davis, CA, 95616, USA
| | - Theodore Kalbfleisch
- Maxwell H. Gluck Equine Research Center, University of Kentucky, Lexington, KY, 40546, USA
| | - Jessica L Petersen
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - Carrie J Finno
- Department of Population Health and Reproduction, Davis School of Veterinary Medicine, University of California, Room 4206 Vet Med3A One Shields Ave, Davis, CA, 95616, USA.
| |
Collapse
|
7
|
Richard D, Muthuirulan P, Young M, Yengo L, Vedantam S, Marouli E, Bartell E, Hirschhorn J, Capellini TD. Functional genomics of human skeletal development and the patterning of height heritability. Cell 2025; 188:15-32.e24. [PMID: 39549696 PMCID: PMC11724752 DOI: 10.1016/j.cell.2024.10.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 08/01/2024] [Accepted: 10/21/2024] [Indexed: 11/18/2024]
Abstract
Underlying variation in height are regulatory changes to chondrocytes, cartilage cells comprising long-bone growth plates. Currently, we lack knowledge on epigenetic regulation and gene expression of chondrocytes sampled across the human skeleton, and therefore we cannot understand basic regulatory mechanisms controlling height biology. We first rectify this issue by generating extensive epigenetic and transcriptomic maps from chondrocytes sampled from different growth plates across developing human skeletons, discovering novel regulatory networks shaping human bone/joint development. Next, using these maps in tandem with height genome-wide association study (GWAS) signals, we disentangle the regulatory impacts that skeletal element-specific versus global-acting variants have on skeletal growth, revealing the prime importance of regulatory pleiotropy in controlling height variation. Finally, as height is highly heritable, and thus often the test case for complex-trait genetics, we leverage these datasets within a testable omnigenic model framework to discover novel chondrocyte developmental modules and peripheral-acting factors shaping height biology and skeletal growth.
Collapse
Affiliation(s)
- Daniel Richard
- Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - Mariel Young
- Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Sailaja Vedantam
- Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA
| | - Eirini Marouli
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eric Bartell
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joel Hirschhorn
- Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Terence D Capellini
- Human Evolutionary Biology, Harvard University, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
8
|
Qi G, Battle A. Computational methods for allele-specific expression in single cells. Trends Genet 2024; 40:939-949. [PMID: 39127549 PMCID: PMC11537817 DOI: 10.1016/j.tig.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/12/2024]
Abstract
Allele-specific expression (ASE) is a powerful signal that can be used to investigate multiple molecular mechanisms, such as cis-regulatory effects and imprinting. Single-cell RNA-sequencing (scRNA-seq) enables ASE characterization at the resolution of individual cells. In this review, we highlight the computational methods for processing and analyzing single-cell ASE data. We first describe a bioinformatics pipeline to obtain ASE counts from raw reads synthesized from previous literature. We then discuss statistical methods for detecting allelic imbalance and its variability across conditions using scRNA-seq data. In addition, we describe other methods that use single-cell ASE to address specific biological questions. Finally, we discuss future directions and emphasize the need for an integrated, optimized bioinformatics pipeline, and further development of statistical methods for different technologies.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
9
|
Andersson J, Aydın E, Gunnarsson R, Lilljebjörn H, Fioretos T, Johansson B, Paulsson K, Yang M. Characterizing the allele-specific gene expression landscape in high hyperdiploid acute lymphoblastic leukemia with BASE. Sci Rep 2024; 14:23181. [PMID: 39369032 PMCID: PMC11455916 DOI: 10.1038/s41598-024-73743-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 09/20/2024] [Indexed: 10/07/2024] Open
Abstract
Somatic copy number variations (CNVs), including abnormal chromosome numbers and structural changes leading to gain or loss of genetic material, play a crucial role in initiation and progression of cancer. CNVs are believed to cause gene dosage imbalances and modify cis-regulatory elements, leading to allelic expression imbalances in genes that influence cell division and thereby contribute to cancer development. However, the impact of CNVs on allelic gene expression in cancer remains unclear. Allele-specific expression (ASE) analysis, a potent method for investigating genome-wide allelic imbalance profiles in tumors, assesses the relative expression of two alleles using high-throughput sequencing data. However, many existing methods for gene-level ASE detection rely on only RNA sequencing data, which present challenges in interpreting the genetic mechanisms underlying ASE in cancer. To address this issue, we developed a robust framework that integrates allele-specific copy number calls into ASE calling algorithms by leveraging paired genome and transcriptome data from the same sample. This integration enhances the interpretability of the genetic mechanisms driving ASE, thereby facilitating the identification of driver events triggered by CNVs in cancer. In this study, we utilized BASE to conduct a comprehensive analysis of ASE in high hyperdiploid acute lymphoblastic leukemia (HeH ALL), a prevalent childhood malignancy characterized by gains of chromosomes X, 4, 6, 10, 14, 17, 18, and 21. Our analysis unveiled the comprehensive ASE landscape in HeH ALL. Through a multi-perspective examination of HeH ASEs, we offer a systematic understanding of how CNVs impact ASE in HeH, providing valuable insights to guide ASE studies in cancer.
Collapse
Affiliation(s)
- Jonas Andersson
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
- Lund University Diabetes Centre, Department of Clinical Sciences Malmö, Lund University, Malmö, Sweden
| | - Efe Aydın
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
| | - Rebeqa Gunnarsson
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
| | - Henrik Lilljebjörn
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
| | - Thoas Fioretos
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
- Department of Clinical Genetics, Pathology, and Molecular Diagnostics, Office for Medical Services, Laboratory Medicine, Region Skåne, Lund, Sweden
| | - Bertil Johansson
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
- Department of Clinical Genetics, Pathology, and Molecular Diagnostics, Office for Medical Services, Laboratory Medicine, Region Skåne, Lund, Sweden
| | - Kajsa Paulsson
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden
| | - Minjun Yang
- Department of Laboratory Medicine, Division of Clinical Genetics, Lund University, Lund, Sweden.
| |
Collapse
|
10
|
Park D, Cenik C. Long-read RNA sequencing reveals allele-specific N 6-methyladenosine modifications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.08.602538. [PMID: 39026828 PMCID: PMC11257478 DOI: 10.1101/2024.07.08.602538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Long-read sequencing technology enables highly accurate detection of allele-specific RNA expression, providing insights into the effects of genetic variation on splicing and RNA abundance. Furthermore, the ability to directly sequence RNA promises the detection of RNA modifications in tandem with ascertaining the allelic origin of each molecule. Here, we leverage these advantages to determine allele-biased patterns of N6-methyladenosine (m6A) modifications in native mRNA. We utilized human and mouse cells with known genetic variants to assign allelic origin of each mRNA molecule combined with a supervised machine learning model to detect read-level m6A modification ratios. Our analyses revealed the importance of sequences adjacent to the DRACH-motif in determining m6A deposition, in addition to allelic differences that directly alter the motif. Moreover, we discovered allele-specific m6A modification (ASM) events with no genetic variants in close proximity to the differentially modified nucleotide, demonstrating the unique advantage of using long reads and surpassing the capabilities of antibody-based short-read approaches. This technological advancement promises to advance our understanding of the role of genetics in determining mRNA modifications.
Collapse
Affiliation(s)
- Dayea Park
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
11
|
Zou X, Gomez ZW, Reddy TE, Allen AS, Majoros WH. Bayesian Estimation of Allele-Specific Expression in the Presence of Phasing Uncertainty. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607371. [PMID: 39211106 PMCID: PMC11361064 DOI: 10.1101/2024.08.09.607371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Motivation Allele-specific expression (ASE) analyses aim to detect imbalanced expression of maternal versus paternal copies of an autosomal gene. Such allelic imbalance can result from a variety of cis-acting causes, including disruptive mutations within one copy of a gene that impact the stability of transcripts, as well as regulatory variants outside the gene that impact transcription initiation. Current methods for ASE estimation suffer from a number of shortcomings, such as relying on only one variant within a gene, assuming perfect phasing information across multiple variants within a gene, or failing to account for alignment biases and possible genotyping errors. Results We developed BEASTIE, a Bayesian hierarchical model designed for precise ASE quantification at the gene level, based on given genotypes and RNA-Seq data. BEASTIE addresses the complexities of allelic mapping bias, genotyping error, and phasing errors by incorporating empirical phasing error rates derived from Genome-in-a-Bottle individual NA12878. BEASTIE surpasses existing methods in accuracy, especially in scenarios with high phasing errors. This improvement is critical for identifying rare genetic variants often obscured by such errors. Through rigorous validation on simulated data and application to real data from the 1000 Genomes Project, we establish the robustness of BEASTIE. These findings underscore the value of BEASTIE in revealing patterns of ASE across gene sets and pathways. Availability and Implementation The software is freely available from https://github.com/x811zou/BEASTIE . BEASTIE is available as Python source code and as a Docker image. Supplementary information Additional information is available online.
Collapse
|
12
|
Zou LS, Cable DM, Barrera-Lopez IA, Zhao T, Murray E, Aryee MJ, Chen F, Irizarry RA. Detection of allele-specific expression in spatial transcriptomics with spASE. Genome Biol 2024; 25:180. [PMID: 38978101 PMCID: PMC11229351 DOI: 10.1186/s13059-024-03317-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 06/20/2024] [Indexed: 07/10/2024] Open
Abstract
Spatial transcriptomics technologies permit the study of the spatial distribution of RNA at near-single-cell resolution genome-wide. However, the feasibility of studying spatial allele-specific expression (ASE) from these data remains uncharacterized. Here, we introduce spASE, a computational framework for detecting and estimating spatial ASE. To tackle the challenges presented by cell type mixtures and a low signal to noise ratio, we implement a hierarchical model involving additive mixtures of spatial smoothing splines. We apply our method to allele-resolved Visium and Slide-seq from the mouse cerebellum and hippocampus and report new insight into the landscape of spatial and cell type-specific ASE therein.
Collapse
Affiliation(s)
- Luli S Zou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Dylan M Cable
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 02139, USA
| | | | - Tongtong Zhao
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Evan Murray
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Martin J Aryee
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Fei Chen
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
| |
Collapse
|
13
|
Heath H, Peng S, Szmatola T, Ryan S, Bellone R, Kalbfleisch T, Petersen J, Finno C. A Comprehensive Allele Specific Expression Resource for the Equine Transcriptome. RESEARCH SQUARE 2024:rs.3.rs-4182812. [PMID: 38645140 PMCID: PMC11030527 DOI: 10.21203/rs.3.rs-4182812/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Background Allele-specific expression (ASE) analysis provides a nuanced view of cis-regulatory mechanisms affecting gene expression. Results An equine ASE analysis was performed, using integrated Iso-seq and short-read RNA sequencing data from four healthy Thoroughbreds (2 mares and 2 stallions) across 9 tissues from the Functional Annotation of Animal Genomes (FAANG) project. Allele expression was quantified by haplotypes from long-read data, with 42,900 allele expression events compared. Within these events, 635 (1.48%) demonstrated ASE, with liver tissue containing the highest proportion. Genetic variants within ASE events were in histone modified regions 64.2% of the time. Validation of allele-specific variants, using a set of 66 equine liver samples from multiple breeds, confirmed that 97% of variants demonstrated ASE. Conclusions This valuable publicly accessible resource is poised to facilitate investigations into regulatory variation in equine tissues. Our results highlight the tissue-specific nature of allelic imbalance in the equine genome.
Collapse
|
14
|
O'Brien CL, Summers KM, Martin NM, Carter-Cusack D, Yang Y, Barua R, Dixit OVA, Hume DA, Pavli P. The relationship between extreme inter-individual variation in macrophage gene expression and genetic susceptibility to inflammatory bowel disease. Hum Genet 2024; 143:233-261. [PMID: 38421405 PMCID: PMC11043138 DOI: 10.1007/s00439-024-02642-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 01/14/2024] [Indexed: 03/02/2024]
Abstract
The differentiation of resident intestinal macrophages from blood monocytes depends upon signals from the macrophage colony-stimulating factor receptor (CSF1R). Analysis of genome-wide association studies (GWAS) indicates that dysregulation of macrophage differentiation and response to microorganisms contributes to susceptibility to chronic inflammatory bowel disease (IBD). Here, we analyzed transcriptomic variation in monocyte-derived macrophages (MDM) from affected and unaffected sib pairs/trios from 22 IBD families and 6 healthy controls. Transcriptional network analysis of the data revealed no overall or inter-sib distinction between affected and unaffected individuals in basal gene expression or the temporal response to lipopolysaccharide (LPS). However, the basal or LPS-inducible expression of individual genes varied independently by as much as 100-fold between subjects. Extreme independent variation in the expression of pairs of HLA-associated transcripts (HLA-B/C, HLA-A/F and HLA-DRB1/DRB5) in macrophages was associated with HLA genotype. Correlation analysis indicated the downstream impacts of variation in the immediate early response to LPS. For example, variation in early expression of IL1B was significantly associated with local SNV genotype and with subsequent peak expression of target genes including IL23A, CXCL1, CXCL3, CXCL8 and NLRP3. Similarly, variation in early IFNB1 expression was correlated with subsequent expression of IFN target genes. Our results support the view that gene-specific dysregulation in macrophage adaptation to the intestinal milieu is associated with genetic susceptibility to IBD.
Collapse
Affiliation(s)
- Claire L O'Brien
- Centre for Research in Therapeutics Solutions, Faculty of Science and Technology, University of Canberra, Canberra, ACT, Australia
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia
| | - Kim M Summers
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Natalia M Martin
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia
| | - Dylan Carter-Cusack
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Yuanhao Yang
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Rasel Barua
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia
| | - Ojas V A Dixit
- Centre for Research in Therapeutics Solutions, Faculty of Science and Technology, University of Canberra, Canberra, ACT, Australia
| | - David A Hume
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia.
| | - Paul Pavli
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia.
- School of Medicine and Psychology, College of Health and Medicine, Australian National University, Canberra, ACT, Australia.
| |
Collapse
|
15
|
Heath HD, Peng S, Szmatola T, Bellone RR, Kalbfleisch T, Petersen JL, Finno CJ. A Comprehensive Allele Specific Expression Resource for the Equine Transcriptome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.31.573798. [PMID: 38260378 PMCID: PMC10802363 DOI: 10.1101/2023.12.31.573798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Background Allele-specific expression (ASE) analysis provides a nuanced view of cis-regulatory mechanisms affecting gene expression. Results In this work, we introduce and highlight the significance of an equine ASE analysis, containing integrated long- and short-read RNA sequencing data, along with insight from histone modification data, from four healthy Thoroughbreds (2 mares and 2 stallions) across 9 tissues. Conclusions This valuable publicly accessible resource is poised to facilitate investigations into regulatory variation in equine tissues and foster a deeper understanding of the impact of allelic imbalance in equine health and disease at the molecular level.
Collapse
|
16
|
Qi G, Strober BJ, Popp JM, Keener R, Ji H, Battle A. Single-cell allele-specific expression analysis reveals dynamic and cell-type-specific regulatory effects. Nat Commun 2023; 14:6317. [PMID: 37813843 PMCID: PMC10562474 DOI: 10.1038/s41467-023-42016-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023] Open
Abstract
Differential allele-specific expression (ASE) is a powerful tool to study context-specific cis-regulation of gene expression. Such effects can reflect the interaction between genetic or epigenetic factors and a measured context or condition. Single-cell RNA sequencing (scRNA-seq) allows the measurement of ASE at individual-cell resolution, but there is a lack of statistical methods to analyze such data. We present Differential Allelic Expression using Single-Cell data (DAESC), a powerful method for differential ASE analysis using scRNA-seq from multiple individuals, with statistical behavior confirmed through simulation. DAESC accounts for non-independence between cells from the same individual and incorporates implicit haplotype phasing. Application to data from 105 induced pluripotent stem cell (iPSC) lines identifies 657 genes dynamically regulated during endoderm differentiation, with enrichment for changes in chromatin state. Application to a type-2 diabetes dataset identifies several differentially regulated genes between patients and controls in pancreatic endocrine cells. DAESC is a powerful method for single-cell ASE analysis and can uncover novel insights on gene regulation.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Benjamin J Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Joshua M Popp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
17
|
Antontseva EV, Degtyareva AO, Korbolina EE, Damarov IS, Merkulova TI. Human-genome single nucleotide polymorphisms affecting transcription factor binding and their role in pathogenesis. Vavilovskii Zhurnal Genet Selektsii 2023; 27:662-675. [PMID: 37965371 PMCID: PMC10641029 DOI: 10.18699/vjgb-23-77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 11/16/2023] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most common type of variation in the human genome. The vast majority of SNPs identified in the human genome do not have any effect on the phenotype; however, some can lead to changes in the function of a gene or the level of its expression. Most SNPs associated with certain traits or pathologies are mapped to regulatory regions of the genome and affect gene expression by changing transcription factor binding sites. In recent decades, substantial effort has been invested in searching for such regulatory SNPs (rSNPs) and understanding the mechanisms by which they lead to phenotypic differences, primarily to individual differences in susceptibility to diseases and in sensitivity to drugs. The development of the NGS (next-generation sequencing) technology has contributed not only to the identification of a huge number of SNPs and to the search for their association (genome-wide association studies, GWASs) with certain diseases or phenotypic manifestations, but also to the development of more productive approaches to their functional annotation. It should be noted that the presence of an association does not allow one to identify a functional, truly disease-associated DNA sequence variant among multiple marker SNPs that are detected due to linkage disequilibrium. Moreover, determination of associations of genetic variants with a disease does not provide information about the functionality of these variants, which is necessary to elucidate the molecular mechanisms of the development of pathology and to design effective methods for its treatment and prevention. In this regard, the functional analysis of SNPs annotated in the GWAS catalog, both at the genome-wide level and at the level of individual SNPs, became especially relevant in recent years. A genome-wide search for potential rSNPs is possible without any prior knowledge of their association with a trait. Thus, mapping expression quantitative trait loci (eQTLs) makes it possible to identify an SNP for which - among transcriptomes of homozygotes and heterozygotes for its various alleles - there are differences in the expression level of certain genes, which can be located at various distances from the SNP. To predict rSNPs, approaches based on searches for allele-specific events in RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, MPRA, and other data are also used. Nonetheless, for a more complete functional annotation of such rSNPs, it is necessary to establish their association with a trait, in particular, with a predisposition to a certain pathology or sensitivity to drugs. Thus, approaches to finding SNPs important for the development of a trait can be categorized into two groups: (1) starting from data on an association of SNPs with a certain trait, (2) starting from the determination of allele-specific changes at the molecular level (in a transcriptome or regulome). Only comprehensive use of strategically different approaches can considerably enrich our knowledge about the role of genetic determinants in the molecular mechanisms of trait formation, including predisposition to multifactorial diseases.
Collapse
Affiliation(s)
- E V Antontseva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - A O Degtyareva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - E E Korbolina
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - I S Damarov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
18
|
Wu EY, Singh NP, Choi K, Zakeri M, Vincent M, Churchill GA, Ackert-Bicknell CL, Patro R, Love MI. SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty. Genome Biol 2023; 24:165. [PMID: 37438847 PMCID: PMC10337143 DOI: 10.1186/s13059-023-03003-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.
Collapse
Affiliation(s)
- Euphy Y Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noor P Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | | | - Cheryl L Ackert-Bicknell
- Department of Orthopedics, School of Medicine, University of Colorado, Anschutz Campus, Aurora, CO, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
19
|
Iqbal MA, Hadlich F, Reyer H, Oster M, Trakooljul N, Murani E, Perdomo‐Sabogal A, Wimmers K, Ponsuksili S. RNA-Seq-based discovery of genetic variants and allele-specific expression of two layer lines and broiler chicken. Evol Appl 2023; 16:1135-1153. [PMID: 37360029 PMCID: PMC10286233 DOI: 10.1111/eva.13557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 04/21/2023] [Accepted: 04/22/2023] [Indexed: 06/28/2023] Open
Abstract
Recent advances in the selective breeding of broilers and layers have made poultry production one of the fastest-growing industries. In this study, a transcriptome variant calling approach from RNA-seq data was used to determine population diversity between broilers and layers. In total, 200 individuals were analyzed from three different chicken populations (Lohmann Brown (LB), n = 90), Lohmann Selected Leghorn (LSL, n = 89), and Broiler (BR, n = 21). The raw RNA-sequencing reads were pre-processed, quality control checked, mapped to the reference genome, and made compatible with Genome Analysis ToolKit for variant detection. Subsequently, pairwise fixation index (F ST) analysis was performed between broilers and layers. Numerous candidate genes were identified, that were associated with growth, development, metabolism, immunity, and other economically significant traits. Finally, allele-specific expression (ASE) analysis was performed in the gut mucosa of LB and LSL strains at 10, 16, 24, 30, and 60 weeks of age. At different ages, the two-layer strains showed significantly different allele-specific expressions in the gut mucosa, and changes in allelic imbalance were observed across the entire lifespan. Most ASE genes are involved in energy metabolism, including sirtuin signaling pathways, oxidative phosphorylation, and mitochondrial dysfunction. A high number of ASE genes were found during the peak of laying, which were particularly enriched in cholesterol biosynthesis. These findings indicate that genetic architecture as well as biological processes driving particular demands relate to metabolic and nutritional requirements during the laying period shape allelic heterogeneity. These processes are considerably affected by breeding and management, whereby elucidating allele-specific gene regulation is an essential step towards deciphering the genotype to phenotype map or functional diversity between the chicken populations. Additionally, we observed that several genes showing significant allelic imbalance also colocalized with the top 1% of genes identified by the FST approach, suggesting a fixation of genes in cis-regulatory elements.
Collapse
Affiliation(s)
| | - Frieder Hadlich
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Henry Reyer
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Michael Oster
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Nares Trakooljul
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Eduard Murani
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | | | - Klaus Wimmers
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
- Faculty of Agricultural and Environmental SciencesUniversity RostockRostockGermany
| | - Siriluck Ponsuksili
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| |
Collapse
|
20
|
Orantes-Bonilla M, Wang H, Lee HT, Golicz AA, Hu D, Li W, Zou J, Snowdon RJ. Transgressive and parental dominant gene expression and cytosine methylation during seed development in Brassica napus hybrids. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:113. [PMID: 37071201 PMCID: PMC10113308 DOI: 10.1007/s00122-023-04345-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 03/12/2023] [Indexed: 05/13/2023]
Abstract
KEY MESSAGE Transcriptomic and epigenomic profiling of gene expression and small RNAs during seed and seedling development reveals expression and methylation dominance levels with implications on early stage heterosis in oilseed rape. The enhanced performance of hybrids through heterosis remains a key aspect in plant breeding; however, the underlying mechanisms are still not fully elucidated. To investigate the potential role of transcriptomic and epigenomic patterns in early expression of hybrid vigor, we investigated gene expression, small RNA abundance and genome-wide methylation in hybrids from two distant Brassica napus ecotypes during seed and seedling developmental stages using next-generation sequencing. A total of 31117, 344, 36229 and 7399 differentially expressed genes, microRNAs, small interfering RNAs and differentially methylated regions were identified, respectively. Approximately 70% of the differentially expressed or methylated features displayed parental dominance levels where the hybrid followed the same patterns as the parents. Via gene ontology enrichment and microRNA-target association analyses during seed development, we found copies of reproductive, developmental and meiotic genes with transgressive and paternal dominance patterns. Interestingly, maternal dominance was more prominent in hypermethylated and downregulated features during seed formation, contrasting to the general maternal gamete demethylation reported during gametogenesis in angiosperms. Associations between methylation and gene expression allowed identification of putative epialleles with diverse pivotal biological functions during seed formation. Furthermore, most differentially methylated regions, differentially expressed siRNAs and transposable elements were in regions that flanked genes without differential expression. This suggests that differential expression and methylation of epigenomic features may help maintain expression of pivotal genes in a hybrid context. Differential expression and methylation patterns during seed formation in an F1 hybrid provide novel insights into genes and mechanisms with potential roles in early heterosis.
Collapse
Affiliation(s)
- Mauricio Orantes-Bonilla
- Department of Plant Breeding, Land Use and Nutrition, IFZ Research Centre for Biosystems, Justus Liebig University, Giessen, Germany
| | - Hao Wang
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science & Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Huey Tyng Lee
- Department of Plant Breeding, Land Use and Nutrition, IFZ Research Centre for Biosystems, Justus Liebig University, Giessen, Germany
| | - Agnieszka A Golicz
- Department of Plant Breeding, Land Use and Nutrition, IFZ Research Centre for Biosystems, Justus Liebig University, Giessen, Germany
| | - Dandan Hu
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science & Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Wenwen Li
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science & Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Jun Zou
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science & Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Rod J Snowdon
- Department of Plant Breeding, Land Use and Nutrition, IFZ Research Centre for Biosystems, Justus Liebig University, Giessen, Germany.
| |
Collapse
|
21
|
Murani E, Hadlich F. Exploration of genotype-by-environment interactions affecting gene expression responses in porcine immune cells. Front Genet 2023; 14:1157267. [PMID: 37007953 PMCID: PMC10061014 DOI: 10.3389/fgene.2023.1157267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 03/06/2023] [Indexed: 03/18/2023] Open
Abstract
As one of the keys to healthy performance, robustness of farm animals is gaining importance, and with this comes increasing interest in genetic dissection of genotype-by-environment interactions (G×E). Changes in gene expression are among the most sensitive responses conveying adaptation to environmental stimuli. Environmentally responsive regulatory variation thus likely plays a central role in G×E. In the present study, we set out to detect action of environmentally responsive cis-regulatory variation by the analysis of condition-dependent allele specific expression (cd-ASE) in porcine immune cells. For this, we harnessed mRNA-sequencing data of peripheral blood mononuclear cells (PBMCs) stimulated in vitro with lipopolysaccharide, dexamethasone, or their combination. These treatments mimic common challenges such as bacterial infection or stress, and induce vast transcriptome changes. About two thirds of the examined loci showed significant ASE in at least one treatment, and out of those about ten percent exhibited cd-ASE. Most of the ASE variants were not yet reported in the PigGTEx Atlas. Genes showing cd-ASE were enriched in cytokine signaling in immune system and include several key candidates for animal health. In contrast, genes showing no ASE featured cell-cycle related functions. We confirmed LPS-dependent ASE for one of the top candidates, SOD2, which ranks among the major response genes in LPS-stimulated monocytes. The results of the present study demonstrate the potential of in vitro cell models coupled with cd-ASE analysis for the investigation of G×E in farm animals. The identified loci may benefit efforts to unravel the genetic basis of robustness and improvement of health and welfare in pigs.
Collapse
Affiliation(s)
- Eduard Murani
- Institute of Genome Biology, Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany
| | | |
Collapse
|
22
|
Allele-specific expression analysis for complex genetic phenotypes applied to a unique dilated cardiomyopathy cohort. Sci Rep 2023; 13:564. [PMID: 36631531 PMCID: PMC9834222 DOI: 10.1038/s41598-023-27591-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 01/04/2023] [Indexed: 01/13/2023] Open
Abstract
Allele-specific expression (ASE) analysis detects the relative abundance of alleles at heterozygous loci as a proxy for cis-regulatory variation, which affects the personal transcriptome and proteome. This study describes the development and application of an ASE analysis pipeline on a unique cohort of 87 well phenotyped and RNA sequenced patients from the Maastricht Cardiomyopathy Registry with dilated cardiomyopathy (DCM), a complex genetic disorder with a remaining gap in explained heritability. Regulatory processes for which ASE is a proxy might explain this gap. We found an overrepresentation of known DCM-associated genes among the significant results across the cohort. In addition, we were able to find genes of interest that have not been associated with DCM through conventional methods such as genome-wide association or differential gene expression studies. The pipeline offers RNA sequencing data processing, individual and population level ASE analyses as well as group comparisons and several intuitive visualizations such as Manhattan plots and protein-protein interaction networks. With this pipeline, we found evidence supporting the case that cis-regulatory variation contributes to the phenotypic heterogeneity of DCM. Additionally, our results highlight that ASE analysis offers an additional layer to conventional genomic and transcriptomic analyses for candidate gene identification and biological insight.
Collapse
|
23
|
Shi J, Wu X, Wang Z, Li F, Meng Y, Moore RM, Cui J, Xue C, Croce KR, Yurdagul A, Doench JG, Li W, Zarbalis KS, Tabas I, Yamamoto A, Zhang H. A genome-wide CRISPR screen identifies WDFY3 as a regulator of macrophage efferocytosis. Nat Commun 2022; 13:7929. [PMID: 36566259 PMCID: PMC9789999 DOI: 10.1038/s41467-022-35604-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 12/13/2022] [Indexed: 12/25/2022] Open
Abstract
Phagocytic clearance of dying cells, termed efferocytosis, is essential for maintaining tissue homeostasis, yet our understanding of efferocytosis regulation remains incomplete. Here we perform a FACS-based, genome-wide CRISPR knockout screen in primary mouse macrophages to search for novel regulators of efferocytosis. The results show that Wdfy3 knockout in macrophages specifically impairs uptake, but not binding, of apoptotic cells due to defective actin disassembly. Additionally, WDFY3 interacts with GABARAP, thus facilitating LC3 lipidation and subsequent lysosomal acidification to permit the degradation of apoptotic cell components. Mechanistically, while the C-terminus of WDFY3 is sufficient to rescue the impaired degradation induced by Wdfy3 knockout, full-length WDFY3 is required to reconstitute the uptake of apoptotic cells. Finally, WDFY3 is also required for efficient efferocytosis in vivo in mice and in vitro in primary human macrophages. This work thus expands our knowledge of the mechanisms of macrophage efferocytosis, as well as supports genome-wide CRISPR screen as a platform for interrogating complex functional phenotypes in primary macrophages.
Collapse
Affiliation(s)
- Jianting Shi
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Xun Wu
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Ziyi Wang
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Fang Li
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Yujiao Meng
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Beijing University of Chinese Medicine, Beijing, China
| | - Rebecca M Moore
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Jian Cui
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Chenyi Xue
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Katherine R Croce
- Department of Pathology and Cell Biology, Columbia University, New York, NY, USA
| | - Arif Yurdagul
- Department of Molecular & Cellular Physiology, Louisiana State University Health Sciences Center at Shreveport, Shreveport, LA, USA
| | - John G Doench
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Wei Li
- Center for Genetic Medicine Research, Children's National Hospital, Washington, DC, USA
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, USA
| | - Konstantinos S Zarbalis
- University of California at Davis, Department of Pathology and Laboratory Medicine, Sacramento, CA, 95817, USA
- Shriners Hospitals for Children Northern California, Sacramento, CA, 95817, USA
- UC Davis MIND Institute, Sacramento, CA, 95817, USA
| | - Ira Tabas
- Department of Pathology and Cell Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
- Department of Physiology and Cellular Biophysics, Columbia University, New York, NY, USA
| | - Ai Yamamoto
- Department of Pathology and Cell Biology, Columbia University, New York, NY, USA
- Department of Neurology, Columbia University, New York, NY, USA
| | - Hanrui Zhang
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
24
|
Zhou T, Afzal R, Haroon M, Ma Y, Zhang H, Li L. Dominant complementation of biological pathways in maize hybrid lines is associated with heterosis. PLANTA 2022; 256:111. [PMID: 36352050 DOI: 10.1007/s00425-022-04028-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 11/03/2022] [Indexed: 06/16/2023]
Abstract
Allele-specific expressed genes (ASEGs) are widespread in maize hybrid lines and play important roles of complementation of biological pathways in heterosis. Heterosis (hybrid vigor) is an important phenomenon with both theoretical and practical value. However, our understanding of the genetic and molecular mechanisms behind heterosis is still limited. Here, we analyzed a comprehensive dataset of maize (Zea mays L.), including RNA-seq data from three hybrid-parent triplets (HPTs) and acetylated protein data from one HPT. The gene expression patterns exhibited extensive variation between the hybrids and their parents, and a substantial number of allele-specific expressed genes (ASEGs) were identified in the hybrids. Notably, ASEGs from different HPTs were significantly enriched in various conserved pathways. The parental alleles of ASEGs with fewer deleterious single-nucleotide polymorphisms were more likely to be expressed in hybrid lines than other parental alleles. ASEGs were mainly enriched in the functional gene ontology terms protein biosynthesis, photosynthesis, and metabolism. In addition, the ASEGs across the three HPTs were involved in key photosynthetic pathways and might enhance the photosynthetic efficiency of the hybrids. These findings suggest that ASEGs involved in complementary biological pathways in maize hybrids contribute to heterosis, shedding new light on the molecular mechanism of heterosis.
Collapse
Affiliation(s)
- Tao Zhou
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Rabail Afzal
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Muhammad Haroon
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuting Ma
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Hongwei Zhang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
25
|
Wilson PC, Muto Y, Wu H, Karihaloo A, Waikar SS, Humphreys BD. Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression. Nat Commun 2022; 13:5253. [PMID: 36068241 PMCID: PMC9448792 DOI: 10.1038/s41467-022-32972-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 08/25/2022] [Indexed: 11/09/2022] Open
Abstract
The proximal tubule is a key regulator of kidney function and glucose metabolism. Diabetic kidney disease leads to proximal tubule injury and changes in chromatin accessibility that modify the activity of transcription factors involved in glucose metabolism and inflammation. Here we use single nucleus RNA and ATAC sequencing to show that diabetic kidney disease leads to reduced accessibility of glucocorticoid receptor binding sites and an injury-associated expression signature in the proximal tubule. We hypothesize that chromatin accessibility is regulated by genetic background and closely-intertwined with metabolic memory, which pre-programs the proximal tubule to respond differently to external stimuli. Glucocorticoid excess has long been known to increase risk for type 2 diabetes, which raises the possibility that glucocorticoid receptor inhibition may mitigate the adverse metabolic effects of diabetic kidney disease.
Collapse
Affiliation(s)
- Parker C Wilson
- Division of Anatomic and Molecular Pathology, Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO, USA
| | - Yoshiharu Muto
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Haojia Wu
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Anil Karihaloo
- Novo Nordisk Research Center Seattle Inc, Seattle, WA, USA
| | - Sushrut S Waikar
- Section of Nephrology, Department of Medicine, Boston University School of Medicine, Boston Medical Center, Boston, MA, USA
| | - Benjamin D Humphreys
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Developmental Biology, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
26
|
Quinones-Valdez G, Fu T, Chan TW, Xiao X. scAllele: A versatile tool for the detection and analysis of variants in scRNA-seq. SCIENCE ADVANCES 2022; 8:eabn6398. [PMID: 36054357 PMCID: PMC11636672 DOI: 10.1126/sciadv.abn6398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 07/19/2022] [Indexed: 05/12/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) data contain rich information at the gene, transcript, and nucleotide levels. Most analyses of scRNA-seq have focused on gene expression profiles, and it remains challenging to extract nucleotide variants and isoform-specific information. Here, we present scAllele, an integrative approach that detects single-nucleotide variants, insertions, deletions, and their allelic linkage with splicing patterns in scRNA-seq. We demonstrate that scAllele achieves better performance in identifying nucleotide variants than other commonly used tools. In addition, the read-specific variant calls by scAllele enables allele-specific splicing analysis, a unique feature not afforded by other methods. Applied to a lung cancer scRNA-seq dataset, scAllele identified variants with strong allelic linkage to alternative splicing, some of which are cancer specific and enriched in cancer-relevant pathways. scAllele represents a versatile tool to uncover multilayer information and previously unidentified biological insights from scRNA-seq data.
Collapse
Affiliation(s)
| | - Ting Fu
- Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Tracey W. Chan
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
27
|
Deng W, Mou T, Pawitan Y, Vu TN. Quantification of mutant–allele expression at isoform level in cancer from RNA-seq data. NAR Genom Bioinform 2022; 4:lqac052. [PMID: 35855322 PMCID: PMC9278039 DOI: 10.1093/nargab/lqac052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 06/26/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Even though the role of DNA mutations in cancer is well recognized, current quantification of the RNA expression, performed either at gene or isoform level, typically ignores the mutation status. Standard methods for estimating allele-specific expression (ASE) consider gene-level expression, but the functional impact of a mutation is best assessed at isoform level. Hence our goal is to quantify the mutant–allele expression at isoform level. We have developed and implemented a method, named MAX, for quantifying mutant–allele expression given a list of mutations. For a gene of interest, a mutant reference is constructed by incorporating all possible mutant versions of the wild-type isoforms in the transcriptome annotation. The mutant reference is then used for the RNA-seq reads mapping, which in principle works similarly for any quantification tool. We apply an alternating EM algorithm to the read-count data from the mapping step. In a simulation study, MAX performs well against standard isoform-quantification methods. Also, MAX achieves higher accuracy than conventional gene-based ASE methods such as ASEP. An analysis of a real dataset of acute myeloid leukemia reveals a subgroup of NPM1-mutated patients responding well to a kinase inhibitor. Our findings indicate that quantification of mutant–allele expression at isoform level is feasible and has potential added values for assessing the functional impact of DNA mutations in cancers.
Collapse
Affiliation(s)
- Wenjiang Deng
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet , Stockholm, Sweden
| | - Tian Mou
- School of Biomedical Engineering, Shenzhen University , Shenzhen, China
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet , Stockholm, Sweden
| | - Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet , Stockholm, Sweden
| |
Collapse
|
28
|
Bothos E, Hatzis P, Moulos P. Interactive Analysis, Exploration, and Visualization of RNA-Seq Data with SeqCVIBE. Methods Protoc 2022; 5:mps5020027. [PMID: 35314664 PMCID: PMC8938808 DOI: 10.3390/mps5020027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/14/2022] [Accepted: 03/16/2022] [Indexed: 11/16/2022] Open
Abstract
The rise of modern gene expression profiling techniques, such as RNA-Seq, has generated a wealth of high-quality datasets spanning all fields of current biological research. The large data sets and the continually expanding applications for which they can be mined, such as the investigation of alternative splicing and others, have created novel challenges for data management, exploration, analysis, and visualization. Although a large variety of RNA-Seq data analysis software packages has emerged, both open-source and commercial, most fail to simultaneously address the above challenges, while they lack obvious functionalities, such as estimating RNA abundance over non-annotated genomic regions of interest in real time. We have developed SeqCVIBE, an R Shiny web application for the interactive exploration, analysis, visualization, and genome browsing of large RNA-Seq datasets. SeqCVIBE allows for multiple on-the-fly visualizations and calculations, such as differential expression analysis, averaging genomic signals over specific regions of the genome, and calculating RNA abundances over custom, potentially non-annotated regions, such as novel long non-coding RNAs. In addition, SeqCVIBE comprises a database for pre-analyzed data, where users can navigate and explore results, as well as perform a variety of basic on-the-fly analyses and export the outcomes. Finally, we demonstrate the value of SeqCVIBE in the elucidation of the interplay of a novel lincRNA, WiNTRLINC1, and Wnt signaling in colon cancer.
Collapse
Affiliation(s)
- Efthimios Bothos
- Institute of Communications and Computer Systems, National Technical University of Athens, 15780 Athens, Greece;
- HybridStat Predictive Analytics PC, Evrota 25, 14564 Kifisia, Greece
| | - Pantelis Hatzis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center Alexander Fleming, Fleming 34, 16672 Vari, Greece;
| | - Panagiotis Moulos
- HybridStat Predictive Analytics PC, Evrota 25, 14564 Kifisia, Greece
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center Alexander Fleming, Fleming 34, 16672 Vari, Greece;
- Correspondence: ; Tel.: +30-210-9656310
| |
Collapse
|
29
|
RNA-seq for revealing the function of the transcriptome. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00002-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
30
|
Berdan EL, Mérot C, Pavia H, Johannesson K, Wellenreuther M, Butlin RK. A large chromosomal inversion shapes gene expression in seaweed flies ( Coelopa frigida). Evol Lett 2021; 5:607-624. [PMID: 34917400 PMCID: PMC8645196 DOI: 10.1002/evl3.260] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/03/2021] [Accepted: 09/12/2021] [Indexed: 11/12/2022] Open
Abstract
Inversions often underlie complex adaptive traits, but the genic targets inside them are largely unknown. Gene expression profiling provides a powerful way to link inversions with their phenotypic consequences. We examined the effects of the Cf-Inv(1) inversion in the seaweed fly Coelopa frigida on gene expression variation across sexes and life stages. Our analyses revealed that Cf-Inv(1) shapes global expression patterns, most likely via linked variation, but the extent of this effect is variable, with much stronger effects in adults than larvae. Furthermore, within adults, both common as well as sex-specific patterns were found. The vast majority of these differentially expressed genes mapped to Cf-Inv(1). However, genes that were differentially expressed in a single context (i.e., in males, females, or larvae) were more likely to be located outside of Cf-Inv(1). By combining our findings with genomic scans for environmentally associated SNPs, we were able to pinpoint candidate variants in the inversion that may underlie mechanistic pathways that determine phenotypes. Together the results of this study, combined with previous findings, support the notion that the polymorphic Cf-Inv(1) inversion in this species is a major factor shaping both coding and regulatory variation resulting in highly complex adaptive effects.
Collapse
Affiliation(s)
- Emma L. Berdan
- Department of Marine SciencesUniversity of GothenburgGothenburgSE‐40530Sweden
| | - Claire Mérot
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecQCG1V 0A6Canada
| | - Henrik Pavia
- Department of Marine SciencesUniversity of GothenburgGothenburgSE‐40530Sweden
| | - Kerstin Johannesson
- Department of Marine SciencesUniversity of GothenburgGothenburgSE‐40530Sweden
| | - Maren Wellenreuther
- The New Zealand Institute for Plant and Food Research Ltd.Nelson7010New Zealand
- School of Biological SciencesUniversity of AucklandAuckland1010New Zealand
| | - Roger K. Butlin
- Department of Marine SciencesUniversity of GothenburgGothenburgSE‐40530Sweden
- Ecology and Evolutionary Biology, School of BiosciencesUniversity of SheffieldSheffieldS10 2TNUnited Kingdom
| |
Collapse
|
31
|
Alonso L, Piron A, Morán I, Guindo-Martínez M, Bonàs-Guarch S, Atla G, Miguel-Escalada I, Royo R, Puiggròs M, Garcia-Hurtado X, Suleiman M, Marselli L, Esguerra JLS, Turatsinze JV, Torres JM, Nylander V, Chen J, Eliasson L, Defrance M, Amela R, Mulder H, Gloyn AL, Groop L, Marchetti P, Eizirik DL, Ferrer J, Mercader JM, Cnop M, Torrents D. TIGER: The gene expression regulatory variation landscape of human pancreatic islets. Cell Rep 2021; 37:109807. [PMID: 34644572 PMCID: PMC8864863 DOI: 10.1016/j.celrep.2021.109807] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/23/2021] [Accepted: 09/16/2021] [Indexed: 12/30/2022] Open
Abstract
Genome-wide association studies (GWASs) identified hundreds of signals associated with type 2 diabetes (T2D). To gain insight into their underlying molecular mechanisms, we have created the translational human pancreatic islet genotype tissue-expression resource (TIGER), aggregating >500 human islet genomic datasets from five cohorts in the Horizon 2020 consortium T2DSystems. We impute genotypes using four reference panels and meta-analyze cohorts to improve the coverage of expression quantitative trait loci (eQTL) and develop a method to combine allele-specific expression across samples (cASE). We identify >1 million islet eQTLs, 53 of which colocalize with T2D signals. Among them, a low-frequency allele that reduces T2D risk by half increases CCND2 expression. We identify eight cASE colocalizations, among which we found a T2D-associated SLC30A8 variant. We make all data available through the TIGER portal (http://tiger.bsc.es), which represents a comprehensive human islet genomic data resource to elucidate how genetic variation affects islet function and translates into therapeutic insight and precision medicine for T2D.
Collapse
Affiliation(s)
- Lorena Alonso
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Anthony Piron
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium; Interuniversity Institute of Bioinformatics in Brussels (IB2), Brussels 1050, Belgium
| | - Ignasi Morán
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Marta Guindo-Martínez
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Sílvia Bonàs-Guarch
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Goutham Atla
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Irene Miguel-Escalada
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Romina Royo
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Montserrat Puiggròs
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Xavier Garcia-Hurtado
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Mara Suleiman
- Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Pisa 56126, Italy
| | - Lorella Marselli
- Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Pisa 56126, Italy
| | - Jonathan L S Esguerra
- Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Malmö 214 28, Sweden
| | | | - Jason M Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK; Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7LF, UK
| | - Vibe Nylander
- Oxford Centre for Diabetes, Endocrinology, and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford OX3 7LE, UK
| | - Ji Chen
- Exeter Centre of Excellence for Diabetes Research (EXCEED), University of Exeter Medical School, Exeter EX4 4PY, UK
| | - Lena Eliasson
- Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Malmö 214 28, Sweden
| | - Matthieu Defrance
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium
| | - Ramon Amela
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Hindrik Mulder
- Unit of Molecular Metabolism, Lund University Diabetes Centre, Malmö 214 28, Sweden
| | - Anna L Gloyn
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7LF, UK; Oxford Centre for Diabetes, Endocrinology, and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford OX3 7LE, UK; Division of Endocrinology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA; NIHR Oxford Biomedical Research Centre, Churchill Hospital, Oxford OX3 7DQ, UK; Stanford Diabetes Research Centre, Stanford University, Stanford, CA 94305, USA
| | - Leif Groop
- Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Malmö 214 28, Sweden; Unit of Molecular Metabolism, Lund University Diabetes Centre, Malmö 214 28, Sweden; Finnish Institute of Molecular Medicine Finland (FIMM), Helsinki University, Helsinki 00014, Finland
| | - Piero Marchetti
- Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Pisa 56126, Italy
| | - Decio L Eizirik
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium; WELBIO, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Jorge Ferrer
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain; Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London SW7 2AZ, UK
| | - Josep M Mercader
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain; Programs in Metabolism and Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - Miriam Cnop
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium; Division of Endocrinology, Erasmus Hospital, Université Libre de Bruxelles, Brussels 1070, Belgium.
| | - David Torrents
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain.
| |
Collapse
|
32
|
Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis. Methods Protoc 2021; 4:mps4040068. [PMID: 34698224 PMCID: PMC8544431 DOI: 10.3390/mps4040068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 08/22/2021] [Accepted: 09/24/2021] [Indexed: 12/13/2022] Open
Abstract
RNA sequencing has become the standard technique for high resolution genome-wide monitoring of gene expression. As such, it often comprises the first step towards understanding complex molecular mechanisms driving various phenotypes, spanning organ development to disease genesis, monitoring and progression. An advantage of RNA sequencing is its ability to capture complex transcriptomic events such as alternative splicing which results in alternate isoform abundance. At the same time, this advantage remains algorithmically and computationally challenging, especially with the emergence of even higher resolution technologies such as single-cell RNA sequencing. Although several algorithms have been proposed for the effective detection of differential isoform expression from RNA-Seq data, no widely accepted golden standards have been established. This fact is further compounded by the significant differences in the output of different algorithms when applied on the same data. In addition, many of the proposed algorithms remain scarce and poorly maintained. Driven by these challenges, we developed a novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques. We demonstrate its usability by applying it on simulated data based on several organisms, and using several performance metrics; we conclude that our strategy outperforms the application of the individual algorithms. Finally, our approach is implemented as an R Shiny application, with the underlying data analysis pipelines also available as docker containers.
Collapse
|
33
|
Cosentino RO, Brink BG, Siegel TN. Allele-specific assembly of a eukaryotic genome corrects apparent frameshifts and reveals a lack of nonsense-mediated mRNA decay. NAR Genom Bioinform 2021; 3:lqab082. [PMID: 34541528 PMCID: PMC8445201 DOI: 10.1093/nargab/lqab082] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 11/14/2022] Open
Abstract
To date, most reference genomes represent a mosaic consensus sequence in which the homologous chromosomes are collapsed into one sequence. This approach produces sequence artefacts and impedes analyses of allele-specific mechanisms. Here, we report an allele-specific genome assembly of the diploid parasite Trypanosoma brucei and reveal allelic variants affecting gene expression. Using long-read sequencing and chromosome conformation capture data, we could assign 99.5% of all heterozygote variants to a specific homologous chromosome and build a 66 Mb long allele-specific genome assembly. The phasing of haplotypes allowed us to resolve hundreds of artefacts present in the previous mosaic consensus assembly. In addition, it revealed allelic recombination events, visible as regions of low allelic heterozygosity, enabling the lineage tracing of T. brucei isolates. Interestingly, analyses of transcriptome and translatome data of genes with allele-specific premature termination codons point to the absence of a nonsense-mediated decay mechanism in trypanosomes. Taken together, this study delivers a reference quality allele-specific genome assembly of T. brucei and demonstrates the importance of such assemblies for the study of gene expression control. We expect the new genome assembly will increase the awareness of allele-specific phenomena and provide a platform to investigate them.
Collapse
Affiliation(s)
- Raúl O Cosentino
- Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität in Munich, Lena-Christ-Str. 48, Planegg-Martinsried 82152, Germany
| | - Benedikt G Brink
- Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität in Munich, Lena-Christ-Str. 48, Planegg-Martinsried 82152, Germany
| | - T Nicolai Siegel
- Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität in Munich, Lena-Christ-Str. 48, Planegg-Martinsried 82152, Germany
| |
Collapse
|
34
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|
35
|
Abstract
Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.
Collapse
Affiliation(s)
- Siobhan Cleary
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| |
Collapse
|
36
|
Degtyareva AO, Antontseva EV, Merkulova TI. Regulatory SNPs: Altered Transcription Factor Binding Sites Implicated in Complex Traits and Diseases. Int J Mol Sci 2021; 22:6454. [PMID: 34208629 PMCID: PMC8235176 DOI: 10.3390/ijms22126454] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/15/2021] [Accepted: 06/15/2021] [Indexed: 12/19/2022] Open
Abstract
The vast majority of the genetic variants (mainly SNPs) associated with various human traits and diseases map to a noncoding part of the genome and are enriched in its regulatory compartment, suggesting that many causal variants may affect gene expression. The leading mechanism of action of these SNPs consists in the alterations in the transcription factor binding via creation or disruption of transcription factor binding sites (TFBSs) or some change in the affinity of these regulatory proteins to their cognate sites. In this review, we first focus on the history of the discovery of regulatory SNPs (rSNPs) and systematized description of the existing methodical approaches to their study. Then, we brief the recent comprehensive examples of rSNPs studied from the discovery of the changes in the TFBS sequence as a result of a nucleotide substitution to identification of its effect on the target gene expression and, eventually, to phenotype. We also describe state-of-the-art genome-wide approaches to identification of regulatory variants, including both making molecular sense of genome-wide association studies (GWAS) and the alternative approaches the primary goal of which is to determine the functionality of genetic variants. Among these approaches, special attention is paid to expression quantitative trait loci (eQTLs) analysis and the search for allele-specific events in RNA-seq (ASE events) as well as in ChIP-seq, DNase-seq, and ATAC-seq (ASB events) data.
Collapse
Affiliation(s)
- Arina O. Degtyareva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Elena V. Antontseva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Tatiana I. Merkulova
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
37
|
Muto Y, Wilson PC, Ledru N, Wu H, Dimke H, Waikar SS, Humphreys BD. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat Commun 2021; 12:2190. [PMID: 33850129 PMCID: PMC8044133 DOI: 10.1038/s41467-021-22368-w] [Citation(s) in RCA: 279] [Impact Index Per Article: 69.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 03/11/2021] [Indexed: 12/15/2022] Open
Abstract
The integration of single cell transcriptome and chromatin accessibility datasets enables a deeper understanding of cell heterogeneity. We performed single nucleus ATAC (snATAC-seq) and RNA (snRNA-seq) sequencing to generate paired, cell-type-specific chromatin accessibility and transcriptional profiles of the adult human kidney. We demonstrate that snATAC-seq is comparable to snRNA-seq in the assignment of cell identity and can further refine our understanding of functional heterogeneity in the nephron. The majority of differentially accessible chromatin regions are localized to promoters and a significant proportion are closely associated with differentially expressed genes. Cell-type-specific enrichment of transcription factor binding motifs implicates the activation of NF-κB that promotes VCAM1 expression and drives transition between a subpopulation of proximal tubule epithelial cells. Our multi-omics approach improves the ability to detect unique cell states within the kidney and redefines cellular heterogeneity in the proximal tubule and thick ascending limb.
Collapse
Affiliation(s)
- Yoshiharu Muto
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Parker C Wilson
- Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO, USA
| | - Nicolas Ledru
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Haojia Wu
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Henrik Dimke
- Department of Cardiovascular and Renal Research, Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
- Department of Nephrology, Odense University Hospital, Odense, Denmark
| | - Sushrut S Waikar
- Section of Nephrology, Department of Medicine, Boston University School of Medicine and Boston Medical Center, Boston, MA, USA
| | - Benjamin D Humphreys
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Developmental Biology, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
38
|
Ura H, Togi S, Niida Y. Targeted Double-Stranded cDNA Sequencing-Based Phase Analysis to Identify Compound Heterozygous Mutations and Differential Allelic Expression. BIOLOGY 2021; 10:biology10040256. [PMID: 33804940 PMCID: PMC8063809 DOI: 10.3390/biology10040256] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 03/22/2021] [Accepted: 03/22/2021] [Indexed: 11/16/2022]
Abstract
Simple Summary Phase analysis to distinguish between in cis and in trans heterozygous mutations is important for clinical diagnosis because in trans compound heterozygous mutations cause autosomal recessive diseases. However, conventional phase analysis is limited because of the large target size of genomic DNA. Here, we performed a targeted double-stranded cDNA sequencing-based phase analysis to resolve the limitation of distance using direct adapter ligation library preparation and paired-end sequencing; we elucidated that two heterozygous mutations on a patient with Wilson disease are in trans compound heterozygous mutations. Furthermore, we detected the differential allelic expression. Our results indicate that a targeted double-stranded cDNA sequencing-based phase analysis is useful for determining compound heterozygous mutations and confers information on allelic expression. Abstract There are two combinations of heterozygous mutation, i.e., in trans, which carries mutations on different alleles, and in cis, which carries mutations on the same allele. Because only in trans compound heterozygous mutations have been implicated in autosomal recessive diseases, it is important to distinguish them for clinical diagnosis. However, conventional phase analysis is limited because of the large target size of genomic DNA. Here, we performed a genetic analysis on a patient with Wilson disease, and we detected two heterozygous mutations chr13:51958362;G>GG (NM_000053.4:c.2304dup r.2304dup p.Met769HisfsTer26) and chr13:51964900;C>T (NM_000053.4:c.1841G>A r.1841g>a p.Gly614Asp) in the causative gene ATP7B. The distance between the two mutations was 6.5 kb in genomic DNA but 464 bp in mRNA. Targeted double-stranded cDNA sequencing-based phase analysis was performed using direct adapter ligation library preparation and paired-end sequencing, and we elucidated they are in trans compound heterozygous mutations. Trio analysis showed that the mutation (chr13:51964900;C>T) derived from the father and the other mutation from the mother, validating that the mutations are in trans composition. Furthermore, targeted double-stranded cDNA sequencing-based phase analysis detected the differential allelic expression, suggesting that the mutation (chr13:51958362;G>GG) caused downregulation of expression by nonsense-mediated mRNA decay. Our results indicate that targeted double-stranded cDNA sequencing-based phase analysis is useful for determining compound heterozygous mutations and confers information on allelic expression.
Collapse
Affiliation(s)
- Hiroki Ura
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan; (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan
- Correspondence: ; Tel.: +81-076-286-2211 (ext. 8353)
| | - Sumihito Togi
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan; (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan
| | - Yo Niida
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan; (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan
| |
Collapse
|
39
|
Fan J, Wang X, Xiao R, Li M. Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data. PLoS Genet 2021; 17:e1009080. [PMID: 33661921 PMCID: PMC7963069 DOI: 10.1371/journal.pgen.1009080] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 03/16/2021] [Accepted: 02/09/2021] [Indexed: 12/27/2022] Open
Abstract
Allelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provides a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases. Detection of allelic expression imbalance (AEI), a phenomenon where the two alleles of a gene differ in their expression magnitude, is a key step towards the understanding of phenotypic variations among individuals. Existing methods detect AEI using bulk RNA sequencing (RNA-seq) data and ignore AEI variations among different cell types. Although single-cell RNA sequencing (scRNA-seq) has enabled the characterization of cell-to-cell heterogeneity in gene expression, the high costs have limited its application in AEI analysis. To overcome this limitation, we developed BSCET to characterize cell-type-specific AEI using the widely available bulk RNA-seq data by integrating cell-type composition information inferred from scRNA-seq samples. Since the degree of AEI may vary with disease phenotypes, we further extended BSCET to detect genes whose cell-type-specific AEIs are associated with clinical factors. Through extensive benchmark evaluations and analyses of two pancreatic islet bulk RNA-seq datasets, we demonstrated BSCET’s ability to refine bulk-level AEI to cell-type resolution, and to identify genes whose cell-type-specific AEIs are associated with the progression of type 2 diabetes. With the vast amount of easily accessible bulk RNA-seq data, we believe BSCET will be a valuable tool for elucidating cell type contributions in human diseases.
Collapse
Affiliation(s)
- Jiaxin Fan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Xuran Wang
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail: (RX); (ML)
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail: (RX); (ML)
| |
Collapse
|
40
|
aScan: A Novel Method for the Study of Allele Specific Expression in Single Individuals. J Mol Biol 2021; 433:166829. [PMID: 33508309 DOI: 10.1016/j.jmb.2021.166829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 01/08/2021] [Accepted: 01/09/2021] [Indexed: 02/06/2023]
Abstract
In diploid organisms, two copies of each allele are normally inherited from parents. Paternal and maternal alleles can be regulated and expressed unequally, which is referred to as allele-specific expression (ASE). In this work, we present aScan, a novel method for the identification of ASE from the analysis of matched individual genomic and RNA sequencing data. By performing extensive analyses of both real and simulated data, we demonstrate that aScan can correctly identify ASE with high accuracy and sensitivity in different experimental settings. Additionally, by applying our method to a small cohort of individuals that are not included in publicly available databases of human genetic variation, we outline the value of possible applications of ASE analysis in single individuals for deriving a more accurate annotation of "private" low-frequency genetic variants associated with regulatory effects on transcription. All in all, we believe that aScan will represent a beneficial addition to the set of bioinformatics tools for the analysis of ASE. Finally, while our method was initially conceived for the analysis of RNA-seq data, it can in principle be applied to any quantitative NGS assay for which matched genotypic and expression data are available. AVAILABILITY: aScan is currently available in the form of an open source standalone software package at: https://github.com/Federico77z/aScan/. aScan version 1.0.3, available at https://github.com/Federico77z/aScan/releases/tag/1.0.3, has been used for all the analyses included in this manuscript. A Docker image of the tool has also been made available at https://github.com/pmandreoli/aScanDocker.
Collapse
|