1
|
Wan J, van Ouwerkerk A, Mouren JC, Heredia C, Pradel L, Ballester B, Andrau JC, Spicuglia S. Comprehensive mapping of genetic variation at Epromoters reveals pleiotropic association with multiple disease traits. Nucleic Acids Res 2025; 53:gkae1270. [PMID: 39727170 PMCID: PMC11879118 DOI: 10.1093/nar/gkae1270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 10/28/2024] [Accepted: 12/19/2024] [Indexed: 12/28/2024] Open
Abstract
There is growing evidence that a wide range of human diseases and physiological traits are influenced by genetic variation of cis-regulatory elements. We and others have shown that a subset of promoter elements, termed Epromoters, also function as enhancer regulators of distal genes. This opens a paradigm in the study of regulatory variants, as single nucleotide polymorphisms (SNPs) within Epromoters might influence the expression of several (distal) genes at the same time, which could disentangle the identification of disease-associated genes. Here, we built a comprehensive resource of human Epromoters using newly generated and publicly available high-throughput reporter assays. We showed that Epromoters display intrinsic and epigenetic features that distinguish them from typical promoters. By integrating Genome-Wide Association Studies (GWAS), expression Quantitative Trait Loci (eQTLs) and 3D chromatin interactions, we found that regulatory variants at Epromoters are concurrently associated with more disease and physiological traits, as compared with typical promoters. To dissect the regulatory impact of Epromoter variants, we evaluated their impact on regulatory activity by analyzing allelic-specific high-throughput reporter assays and provided reliable examples of pleiotropic Epromoters. In summary, our study represents a comprehensive resource of regulatory variants supporting the pleiotropic role of Epromoters.
Collapse
Affiliation(s)
- Jing Wan
- Aix-Marseille University, INSERM, TAGC, UMR 1090 Marseille, France
- Equipe Labellisée LIGUE, 2023 Marseille, France
| | - Antoinette van Ouwerkerk
- Aix-Marseille University, INSERM, TAGC, UMR 1090 Marseille, France
- Equipe Labellisée LIGUE, 2023 Marseille, France
| | | | - Carla Heredia
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, UMR 5535, Montpellier, France
| | - Lydie Pradel
- Aix-Marseille University, INSERM, TAGC, UMR 1090 Marseille, France
- Equipe Labellisée LIGUE, 2023 Marseille, France
| | - Benoit Ballester
- Aix-Marseille University, INSERM, TAGC, UMR 1090 Marseille, France
| | - Jean-Christophe Andrau
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, UMR 5535, Montpellier, France
| | - Salvatore Spicuglia
- Aix-Marseille University, INSERM, TAGC, UMR 1090 Marseille, France
- Equipe Labellisée LIGUE, 2023 Marseille, France
| |
Collapse
|
2
|
Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024; 14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open
Abstract
Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.
Collapse
Affiliation(s)
- Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
3
|
O'Brien CL, Summers KM, Martin NM, Carter-Cusack D, Yang Y, Barua R, Dixit OVA, Hume DA, Pavli P. The relationship between extreme inter-individual variation in macrophage gene expression and genetic susceptibility to inflammatory bowel disease. Hum Genet 2024; 143:233-261. [PMID: 38421405 PMCID: PMC11043138 DOI: 10.1007/s00439-024-02642-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 01/14/2024] [Indexed: 03/02/2024]
Abstract
The differentiation of resident intestinal macrophages from blood monocytes depends upon signals from the macrophage colony-stimulating factor receptor (CSF1R). Analysis of genome-wide association studies (GWAS) indicates that dysregulation of macrophage differentiation and response to microorganisms contributes to susceptibility to chronic inflammatory bowel disease (IBD). Here, we analyzed transcriptomic variation in monocyte-derived macrophages (MDM) from affected and unaffected sib pairs/trios from 22 IBD families and 6 healthy controls. Transcriptional network analysis of the data revealed no overall or inter-sib distinction between affected and unaffected individuals in basal gene expression or the temporal response to lipopolysaccharide (LPS). However, the basal or LPS-inducible expression of individual genes varied independently by as much as 100-fold between subjects. Extreme independent variation in the expression of pairs of HLA-associated transcripts (HLA-B/C, HLA-A/F and HLA-DRB1/DRB5) in macrophages was associated with HLA genotype. Correlation analysis indicated the downstream impacts of variation in the immediate early response to LPS. For example, variation in early expression of IL1B was significantly associated with local SNV genotype and with subsequent peak expression of target genes including IL23A, CXCL1, CXCL3, CXCL8 and NLRP3. Similarly, variation in early IFNB1 expression was correlated with subsequent expression of IFN target genes. Our results support the view that gene-specific dysregulation in macrophage adaptation to the intestinal milieu is associated with genetic susceptibility to IBD.
Collapse
Affiliation(s)
- Claire L O'Brien
- Centre for Research in Therapeutics Solutions, Faculty of Science and Technology, University of Canberra, Canberra, ACT, Australia
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia
| | - Kim M Summers
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Natalia M Martin
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia
| | - Dylan Carter-Cusack
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Yuanhao Yang
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Rasel Barua
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia
| | - Ojas V A Dixit
- Centre for Research in Therapeutics Solutions, Faculty of Science and Technology, University of Canberra, Canberra, ACT, Australia
| | - David A Hume
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia.
| | - Paul Pavli
- Inflammatory Bowel Disease Research Group, Canberra Hospital, Canberra, ACT, Australia.
- School of Medicine and Psychology, College of Health and Medicine, Australian National University, Canberra, ACT, Australia.
| |
Collapse
|
4
|
Boye C, Kalita CA, Findley AS, Alazizi A, Wei J, Wen X, Pique-Regi R, Luca F. Characterization of caffeine response regulatory variants in vascular endothelial cells. eLife 2024; 13:e85235. [PMID: 38334359 PMCID: PMC10901511 DOI: 10.7554/elife.85235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 02/08/2024] [Indexed: 02/10/2024] Open
Abstract
Genetic variants in gene regulatory sequences can modify gene expression and mediate the molecular response to environmental stimuli. In addition, genotype-environment interactions (GxE) contribute to complex traits such as cardiovascular disease. Caffeine is the most widely consumed stimulant and is known to produce a vascular response. To investigate GxE for caffeine, we treated vascular endothelial cells with caffeine and used a massively parallel reporter assay to measure allelic effects on gene regulation for over 43,000 genetic variants. We identified 665 variants with allelic effects on gene regulation and 6 variants that regulate the gene expression response to caffeine (GxE, false discovery rate [FDR] < 5%). When overlapping our GxE results with expression quantitative trait loci colocalized with coronary artery disease and hypertension, we dissected their regulatory mechanisms and showed a modulatory role for caffeine. Our results demonstrate that massively parallel reporter assay is a powerful approach to identify and molecularly characterize GxE in the specific context of caffeine consumption.
Collapse
Affiliation(s)
- Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Cynthia A Kalita
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Anthony S Findley
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Adnan Alazizi
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Julong Wei
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Xiaoquan Wen
- Department of Biostatistics, University of MichiganAnn ArborUnited States
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
- Department of Obstetrics and Gynecology, Wayne State UniversityDetroitUnited States
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
- Department of Obstetrics and Gynecology, Wayne State UniversityDetroitUnited States
- Department of Biology, University of Rome Tor VergataRomeItaly
| |
Collapse
|
5
|
Antontseva EV, Degtyareva AO, Korbolina EE, Damarov IS, Merkulova TI. Human-genome single nucleotide polymorphisms affecting transcription factor binding and their role in pathogenesis. Vavilovskii Zhurnal Genet Selektsii 2023; 27:662-675. [PMID: 37965371 PMCID: PMC10641029 DOI: 10.18699/vjgb-23-77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 11/16/2023] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most common type of variation in the human genome. The vast majority of SNPs identified in the human genome do not have any effect on the phenotype; however, some can lead to changes in the function of a gene or the level of its expression. Most SNPs associated with certain traits or pathologies are mapped to regulatory regions of the genome and affect gene expression by changing transcription factor binding sites. In recent decades, substantial effort has been invested in searching for such regulatory SNPs (rSNPs) and understanding the mechanisms by which they lead to phenotypic differences, primarily to individual differences in susceptibility to diseases and in sensitivity to drugs. The development of the NGS (next-generation sequencing) technology has contributed not only to the identification of a huge number of SNPs and to the search for their association (genome-wide association studies, GWASs) with certain diseases or phenotypic manifestations, but also to the development of more productive approaches to their functional annotation. It should be noted that the presence of an association does not allow one to identify a functional, truly disease-associated DNA sequence variant among multiple marker SNPs that are detected due to linkage disequilibrium. Moreover, determination of associations of genetic variants with a disease does not provide information about the functionality of these variants, which is necessary to elucidate the molecular mechanisms of the development of pathology and to design effective methods for its treatment and prevention. In this regard, the functional analysis of SNPs annotated in the GWAS catalog, both at the genome-wide level and at the level of individual SNPs, became especially relevant in recent years. A genome-wide search for potential rSNPs is possible without any prior knowledge of their association with a trait. Thus, mapping expression quantitative trait loci (eQTLs) makes it possible to identify an SNP for which - among transcriptomes of homozygotes and heterozygotes for its various alleles - there are differences in the expression level of certain genes, which can be located at various distances from the SNP. To predict rSNPs, approaches based on searches for allele-specific events in RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, MPRA, and other data are also used. Nonetheless, for a more complete functional annotation of such rSNPs, it is necessary to establish their association with a trait, in particular, with a predisposition to a certain pathology or sensitivity to drugs. Thus, approaches to finding SNPs important for the development of a trait can be categorized into two groups: (1) starting from data on an association of SNPs with a certain trait, (2) starting from the determination of allele-specific changes at the molecular level (in a transcriptome or regulome). Only comprehensive use of strategically different approaches can considerably enrich our knowledge about the role of genetic determinants in the molecular mechanisms of trait formation, including predisposition to multifactorial diseases.
Collapse
Affiliation(s)
- E V Antontseva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - A O Degtyareva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - E E Korbolina
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - I S Damarov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
6
|
Ying P, Chen C, Lu Z, Chen S, Zhang M, Cai Y, Zhang F, Huang J, Fan L, Ning C, Li Y, Wang W, Geng H, Liu Y, Tian W, Yang Z, Liu J, Huang C, Yang X, Xu B, Li H, Zhu X, Li N, Li B, Wei Y, Zhu Y, Tian J, Miao X. Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk. Nat Commun 2023; 14:5958. [PMID: 37749132 PMCID: PMC10520073 DOI: 10.1038/s41467-023-41690-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 09/14/2023] [Indexed: 09/27/2023] Open
Abstract
Genome-wide association studies have identified numerous variants associated with human complex traits, most of which reside in the non-coding regions, but biological mechanisms remain unclear. However, assigning function to the non-coding elements is still challenging. Here we apply Activity-by-Contact (ABC) model to evaluate enhancer-gene regulation effect by integrating multi-omics data and identified 544,849 connections across 20 cancer types. ABC model outperforms previous approaches in linking regulatory variants to target genes. Furthermore, we identify over 30,000 enhancer-gene connections in colorectal cancer (CRC) tissues. By integrating large-scale population cohorts (23,813 cases and 29,973 controls) and multipronged functional assays, we demonstrate an ABC regulatory variant rs4810856 associated with CRC risk (Odds Ratio = 1.11, 95%CI = 1.05-1.16, P = 4.02 × 10-5) by acting as an allele-specific enhancer to distally facilitate PREX1, CSE1L and STAU1 expression, which synergistically activate p-AKT signaling. Our study provides comprehensive regulation maps and illuminates a single variant regulating multiple genes, providing insights into cancer etiology.
Collapse
Grants
- Distinguished Young Scholars of China (NSFC-81925032), Key Program of National Natural Science Foundation of China (NSFC-82130098), the Fundamental Research Funds for the Central Universities (2042022rc0026, 2042023kf1005),Knowledge Innovation Program of Wuhan (2023020201010060).
- Youth Program of National Natural Science Foundation of China (NSFC-82003547), Program of Health Commission of Hubei Province (WJ2023M045) and Fundamental Research Funds for the Central Universities (WHU: 2042022kf1031).
- The National Science Fund for Excellent Young Scholars (NSFC-82322058), Program of National Natural Science Foundation of China (NSFC-82103929, NSFC-82273713), Young Elite Scientists Sponsorship Program by cst(2022QNRC001), National Science Fund for Distinguished Young Scholars of Hubei Province of China (2023AFA046), Fundamental Research Funds for the Central Universities (WHU:2042022kf1205) and Knowledge Innovation Program of Wuhan (whkxjsj011, 2023020201010073).
Collapse
Affiliation(s)
- Pingting Ying
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
- Department of Gastrointestinal Oncology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
- Department of Radiation Oncology, Renmin Hospital of Wuhan University, Wuhan, 430071, China
| | - Can Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
- Department of Gastrointestinal Oncology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
- Department of Radiation Oncology, Renmin Hospital of Wuhan University, Wuhan, 430071, China
| | - Zequn Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
- Department of Gastrointestinal Oncology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
- Department of Radiation Oncology, Renmin Hospital of Wuhan University, Wuhan, 430071, China
| | - Shuoni Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Ming Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Yimin Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Fuwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Jinyu Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Linyun Fan
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Caibo Ning
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Yanmin Li
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Wenzhuo Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Hui Geng
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Yizhuo Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Wen Tian
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Zhiyong Yang
- Department of Hepatobiliary and Pancreatic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Jiuyang Liu
- Department of Gastrointestinal Surgery, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, 430071, China
| | - Chaoqun Huang
- Department of Gastrointestinal Surgery, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, 430071, China
| | - Xiaojun Yang
- Department of Gastrointestinal Surgery, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, 430071, China
| | - Bin Xu
- Cancer Center, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430060, China
| | - Heng Li
- Department of Urology, Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Xu Zhu
- Department of Gastrointestinal Surgery, Renmin Hospital of Wuhan University, Wuhan, 430071, China
| | - Ni Li
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Bin Li
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Yongchang Wei
- Department of Gastrointestinal Oncology, Hubei Cancer Clinical Study Center, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Ying Zhu
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China
| | - Jianbo Tian
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China.
- Department of Gastrointestinal Oncology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China.
- Department of Radiation Oncology, Renmin Hospital of Wuhan University, Wuhan, 430071, China.
| | - Xiaoping Miao
- Department of Epidemiology and Biostatistics, School of Public Health, Wuhan University, Wuhan, 430071, China.
- Department of Gastrointestinal Oncology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China.
- Department of Radiation Oncology, Renmin Hospital of Wuhan University, Wuhan, 430071, China.
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China.
| |
Collapse
|
7
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
8
|
Das M, Hossain A, Banerjee D, Praul CA, Girirajan S. Challenges and considerations for reproducibility of STARR-seq assays. Genome Res 2023; 33:479-495. [PMID: 37130797 PMCID: PMC10234304 DOI: 10.1101/gr.277204.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 03/15/2023] [Indexed: 05/04/2023]
Abstract
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The assay is long, with more than 250 steps, and frequent customization of the protocol and variations in bioinformatics methods raise concerns for reproducibility of STARR-seq studies. Here, we assess each step of the protocol and analysis pipelines from published sources and in-house assays, and identify critical steps and quality control (QC) checkpoints necessary for reproducibility of the assay. We also provide guidelines for experimental design, protocol scaling, customization, and analysis pipelines for better adoption of the assay. These resources will allow better optimization of STARR-seq for specific research needs, enable comparisons and integration across studies, and improve the reproducibility of results.
Collapse
Affiliation(s)
- Maitreya Das
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ayaan Hossain
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Deepro Banerjee
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Craig Alan Praul
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
9
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
10
|
Luo S, Xiong D, Zhao X, Duan L. An Attempt of Seeking Favorable Binding Free Energy Prediction Schemes Considering the Entropic Effect on Fis-DNA Binding. J Phys Chem B 2023; 127:1312-1324. [PMID: 36735878 DOI: 10.1021/acs.jpcb.2c07811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Protein-DNA binding mechanisms in a complex manner are essential for understanding many biological processes. Over the past decades, numerous experiments and calculations have analyzed the specificity of protein-DNA binding. However, the accuracy of binding free energy prediction for multi-base DNA systems still needs to be improved. Fis is a DNA-binding protein that regulates various transcription and recombination reactions. In the present work, we tested several methods of predict binding free energy based on this system to find a favorable prediction scheme and explore the binding mechanism of Fis protein and DNA. Two solvent models (explicit and implicit solvent models) were chosen for the dynamics process, and the predicted binding free energy was more accurate under the explicit solvent model. When different Poisson-Boltzmann/Generalized Born (PB/GB) models were tested for DNA force fields (BSC1 and OL15), it was found that the binding free energy predicted by the selected OL15 force field performed better and the correlation between predicted and experimental values was improved with the increasing interior dielectric constant (Dk). Finally, using Dk = 8, the GBOBC1 model combined with interaction entropy (IE), which was calculated for entropic contribution (GBOBC1_IE_8), was screened out for the binding free energy prediction and analysis of the Fis-DNA system, and the validity of the method was further verified by testing the Cren7-DNA system. By performing conformational analysis of the minor groove, it was found that mutation of the DNA central sequence A/T to C/G and deletion of the guanine 2-amino group would change the minor groove width and thus affect the formation of the major groove, altering the interaction and atomic contact between the protein and the major groove, thus changing the binding affinity of Fis and DNA. Hopefully, the series of tests in this work can shed some light on the related studies of protein and DNA systems.
Collapse
Affiliation(s)
- Song Luo
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong250014, China
| | - Danyang Xiong
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong250014, China
| | - Xiaoyu Zhao
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong250014, China
| | - Lili Duan
- School of Physics and Electronics, Shandong Normal University, Jinan, Shandong250014, China
| |
Collapse
|
11
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
12
|
Lu F, Sossin A, Abell N, Montgomery SB, He Z. Deep learning-assisted genome-wide characterization of massively parallel reporter assays. Nucleic Acids Res 2022; 50:11442-11454. [PMID: 36350674 PMCID: PMC9723615 DOI: 10.1093/nar/gkac990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 10/04/2022] [Accepted: 10/19/2022] [Indexed: 11/10/2022] Open
Abstract
Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.
Collapse
Affiliation(s)
| | | | - Nathan Abell
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA 94305, USA,Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Zihuai He
- To whom correspondence should be addressed. Tel: +1 718 869 4929;
| |
Collapse
|
13
|
Alsheikh AJ, Wollenhaupt S, King EA, Reeb J, Ghosh S, Stolzenburg LR, Tamim S, Lazar J, Davis JW, Jacob HJ. The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases. BMC Med Genomics 2022; 15:74. [PMID: 35365203 PMCID: PMC8973751 DOI: 10.1186/s12920-022-01216-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/17/2022] [Indexed: 02/08/2023] Open
Abstract
Background The remarkable growth of genome-wide association studies (GWAS) has created a critical need to experimentally validate the disease-associated variants, 90% of which involve non-coding variants. Methods To determine how the field is addressing this urgent need, we performed a comprehensive literature review identifying 36,676 articles. These were reduced to 1454 articles through a set of filters using natural language processing and ontology-based text-mining. This was followed by manual curation and cross-referencing against the GWAS catalog, yielding a final set of 286 articles. Results We identified 309 experimentally validated non-coding GWAS variants, regulating 252 genes across 130 human disease traits. These variants covered a variety of regulatory mechanisms. Interestingly, 70% (215/309) acted through cis-regulatory elements, with the remaining through promoters (22%, 70/309) or non-coding RNAs (8%, 24/309). Several validation approaches were utilized in these studies, including gene expression (n = 272), transcription factor binding (n = 175), reporter assays (n = 171), in vivo models (n = 104), genome editing (n = 96) and chromatin interaction (n = 33). Conclusions This review of the literature is the first to systematically evaluate the status and the landscape of experimentation being used to validate non-coding GWAS-identified variants. Our results clearly underscore the multifaceted approach needed for experimental validation, have practical implications on variant prioritization and considerations of target gene nomination. While the field has a long way to go to validate the thousands of GWAS associations, we show that progress is being made and provide exemplars of validation studies covering a wide variety of mechanisms, target genes, and disease areas. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01216-w.
Collapse
Affiliation(s)
- Ammar J Alsheikh
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA.
| | - Sabrina Wollenhaupt
- Information Research, AbbVie Deutschland GmbH & Co. KG, 67061, Knollstrasse, Ludwigshafen, Germany
| | - Emily A King
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA
| | - Jonas Reeb
- Information Research, AbbVie Deutschland GmbH & Co. KG, 67061, Knollstrasse, Ludwigshafen, Germany
| | - Sujana Ghosh
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA
| | | | - Saleh Tamim
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA
| | - Jozef Lazar
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA
| | - J Wade Davis
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA
| | - Howard J Jacob
- Genomics Research Center, AbbVie Inc, North Chicago, Illinois, 60064, USA
| |
Collapse
|
14
|
Papoutsopoulou S, Morris L, Bayliff A, Mair T, England H, Stagi M, Bergey F, Alam MT, Sheibani-Tezerji R, Rosenstiel P, Müller W, Martins Dos Santos VAP, Campbell BJ. Effects of Human RelA Transgene on Murine Macrophage Inflammatory Responses. Biomedicines 2022; 10:biomedicines10040757. [PMID: 35453507 PMCID: PMC9027775 DOI: 10.3390/biomedicines10040757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 03/14/2022] [Accepted: 03/18/2022] [Indexed: 02/04/2023] Open
Abstract
The NFκB transcription factors are major regulators of innate immune responses, and NFκB signal pathway dysregulation is linked to inflammatory disease. Here, we utilised bone marrow-derived macrophages from the p65-DsRedxp/IκBα-eGFP transgenic strain to study the functional implication of xenogeneic (human) RelA(p65) protein introduced into the mouse genome. Confocal imaging showed that human RelA is expressed in the cells and can translocate to the nucleus following activation of Toll-like receptor 4. RNA sequencing of lipid A-stimulated macrophages, revealed that human RelA impacts on murine gene transcription, affecting both non-NFκB and NFκB target genes, including immediate-early and late response genes, e.g., Fos and Cxcl10. Validation experiments on NFκB targets revealed markedly reduced mRNA levels, but similar kinetic profiles in transgenic cells compared to wild-type. Enrichment pathway analysis of differentially expressed genes revealed interferon and cytokine signaling were affected. These immune response pathways were also affected in macrophages treated with tumor necrosis factor. Data suggests that the presence of xenogeneic RelA protein likely has inhibitory activity, altering specific transcriptional profiles of key molecules involved in immune responses. It is therefore essential that this information be taken into consideration when designing and interpreting future experiments using this transgenic strain.
Collapse
Affiliation(s)
- Stamatia Papoutsopoulou
- Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK; (H.E.); (W.M.)
- Department of Biochemistry and Biotechnology, School of Health Sciences, University of Thessaly, 413 34 Larissa, Greece
- Correspondence: (S.P.); (B.J.C.)
| | - Lorna Morris
- LifeGlimmer GmbH, Markelstr. 39A, 12163 Berlin, Germany; (L.M.); (F.B.); (V.A.P.M.D.S.)
| | - Andrew Bayliff
- The Henry Wellcome Laboratories of Molecular & Cellular Gastroenterology, Department of Infection Biology & Microbiomes, Institute of Infection Veterinary and Ecological Sciences, University of Liverpool, Liverpool L69 3GE, UK; (A.B.); (T.M.)
| | - Thomas Mair
- The Henry Wellcome Laboratories of Molecular & Cellular Gastroenterology, Department of Infection Biology & Microbiomes, Institute of Infection Veterinary and Ecological Sciences, University of Liverpool, Liverpool L69 3GE, UK; (A.B.); (T.M.)
| | - Hazel England
- Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK; (H.E.); (W.M.)
| | - Massimiliano Stagi
- Department of Molecular Physiology and Cell Signalling, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, UK;
| | - François Bergey
- LifeGlimmer GmbH, Markelstr. 39A, 12163 Berlin, Germany; (L.M.); (F.B.); (V.A.P.M.D.S.)
| | - Mohammad Tauqeer Alam
- Warwick Medical School, Bioinformatics RTP, University of Warwick, Coventry CV4 7AL, UK;
- Department of Biology, College of Science, United Arab Emirates University, Abu Dhabi P.O. Box 15551, United Arab Emirates
| | - Raheleh Sheibani-Tezerji
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, 6708 WE Kiel, Germany; (R.S.-T.); (P.R.)
| | - Philip Rosenstiel
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, 6708 WE Kiel, Germany; (R.S.-T.); (P.R.)
| | - Werner Müller
- Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK; (H.E.); (W.M.)
| | - Vitor A. P. Martins Dos Santos
- LifeGlimmer GmbH, Markelstr. 39A, 12163 Berlin, Germany; (L.M.); (F.B.); (V.A.P.M.D.S.)
- Laboratory of Systems & Synthetic Biology, Wageningen University & Research, P.O. Box 8033, 6700 EJ Wageningen, The Netherlands
| | - Barry J. Campbell
- The Henry Wellcome Laboratories of Molecular & Cellular Gastroenterology, Department of Infection Biology & Microbiomes, Institute of Infection Veterinary and Ecological Sciences, University of Liverpool, Liverpool L69 3GE, UK; (A.B.); (T.M.)
- Correspondence: (S.P.); (B.J.C.)
| |
Collapse
|
15
|
Toropainen A, Stolze LK, Örd T, Whalen MB, Torrell PM, Link VM, Kaikkonen MU, Romanoski CE. Functional noncoding SNPs in human endothelial cells fine-map vascular trait associations. Genome Res 2022; 32:409-424. [PMID: 35193936 PMCID: PMC8896458 DOI: 10.1101/gr.276064.121] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 01/06/2022] [Indexed: 11/25/2022]
Abstract
Functional consequences of genetic variation in the noncoding human genome are difficult to ascertain despite demonstrated associations to common, complex disease traits. To elucidate properties of functional noncoding SNPs with effects in human endothelial cells (ECs), we utilized our previous molecular quantitative trait locus (molQTL) analysis for transcription factor binding, chromatin accessibility, and H3K27 acetylation to nominate a set of likely functional noncoding SNPs. Together with information from genome-wide association studies (GWASs) for vascular disease traits, we tested the ability of 34,344 variants to perturb enhancer function in ECs using the highly multiplexed STARR-seq assay. Of these, 5711 variants validated, whose enriched attributes included: (1) mutations to TF binding motifs for ETS or AP-1 that are regulators of the EC state; (2) location in accessible and H3K27ac-marked EC chromatin; and (3) molQTL associations whereby alleles associate with differences in chromatin accessibility and TF binding across genetically diverse ECs. Next, using pro-inflammatory IL1B as an activator of cell state, we observed robust evidence (>50%) of context-specific SNP effects, underscoring the prevalence of noncoding gene-by-environment (GxE) effects. Lastly, using these cumulative data, we fine-mapped vascular disease loci and highlighted evidence suggesting mechanisms by which noncoding SNPs at two loci affect risk for pulse pressure/large artery stroke and abdominal aortic aneurysm through respective effects on transcriptional regulation of POU4F1 and LDAH. Together, we highlight the attributes and context dependence of functional noncoding SNPs and provide new mechanisms underlying vascular disease risk.
Collapse
Affiliation(s)
- Anu Toropainen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Lindsey K Stolze
- The Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, Arizona 85721, USA.,The Genetics Interdisciplinary Graduate Program, The University of Arizona, Tucson, Arizona 85721, USA
| | - Tiit Örd
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Michael B Whalen
- The Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, Arizona 85721, USA
| | - Paula Martí Torrell
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Verena M Link
- Metaorganism Immunity Section, Laboratory of Host Immunity and Microbiome, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Minna U Kaikkonen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Casey E Romanoski
- The Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, Arizona 85721, USA.,The Genetics Interdisciplinary Graduate Program, The University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
16
|
Weighill D, Ben Guebila M, Glass K, Quackenbush J, Platig J. Predicting genotype-specific gene regulatory networks. Genome Res 2022; 32:524-533. [PMID: 35193937 PMCID: PMC8896459 DOI: 10.1101/gr.275107.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 01/11/2022] [Indexed: 11/25/2022]
Abstract
Understanding how each person's unique genotype influences their individual patterns of gene regulation has the potential to improve our understanding of human health and development, and to refine genotype-specific disease risk assessments and treatments. However, the effects of genetic variants are not typically considered when constructing gene regulatory networks, despite the fact that many disease-associated genetic variants are thought to have regulatory effects, including the disruption of transcription factor (TF) binding. We developed EGRET (Estimating the Genetic Regulatory Effect on TFs), which infers a genotype-specific gene regulatory network for each individual in a study population. EGRET begins by constructing a genotype-informed TF-gene prior network derived using TF motif predictions, expression quantitative trait locus (eQTL) data, individual genotypes, and the predicted effects of genetic variants on TF binding. It then uses a technique known as message passing to integrate this prior network with gene expression and TF protein–protein interaction data to produce a refined, genotype-specific regulatory network. We used EGRET to infer gene regulatory networks for two blood-derived cell lines and identified genotype-associated, cell line–specific regulatory differences that we subsequently validated using allele-specific expression, chromatin accessibility QTLs, and differential ChIP-seq TF binding. We also inferred EGRET networks for three cell types from each of 119 individuals and identified cell type–specific regulatory differences associated with diseases related to those cell types. EGRET is, to our knowledge, the first method that infers networks reflective of individual genetic variation in a way that provides insight into the genetic regulatory associations driving complex phenotypes.
Collapse
Affiliation(s)
- Deborah Weighill
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | | | - Kimberly Glass
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Quackenbush
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - John Platig
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
17
|
Findley AS, Zhang X, Boye C, Lin YL, Kalita CA, Barreiro L, Lohmueller KE, Pique-Regi R, Luca F. A signature of Neanderthal introgression on molecular mechanisms of environmental responses. PLoS Genet 2021; 17:e1009493. [PMID: 34570765 PMCID: PMC8509894 DOI: 10.1371/journal.pgen.1009493] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 10/12/2021] [Accepted: 08/18/2021] [Indexed: 12/17/2022] Open
Abstract
Ancient human migrations led to the settlement of population groups in varied environmental contexts worldwide. The extent to which adaptation to local environments has shaped human genetic diversity is a longstanding question in human evolution. Recent studies have suggested that introgression of archaic alleles in the genome of modern humans may have contributed to adaptation to environmental pressures such as pathogen exposure. Functional genomic studies have demonstrated that variation in gene expression across individuals and in response to environmental perturbations is a main mechanism underlying complex trait variation. We considered gene expression response to in vitro treatments as a molecular phenotype to identify genes and regulatory variants that may have played an important role in adaptations to local environments. We investigated if Neanderthal introgression in the human genome may contribute to the transcriptional response to environmental perturbations. To this end we used eQTLs for genes differentially expressed in a panel of 52 cellular environments, resulting from 5 cell types and 26 treatments, including hormones, vitamins, drugs, and environmental contaminants. We found that SNPs with introgressed Neanderthal alleles (N-SNPs) disrupt binding of transcription factors important for environmental responses, including ionizing radiation and hypoxia, and for glucose metabolism. We identified an enrichment for N-SNPs among eQTLs for genes differentially expressed in response to 8 treatments, including glucocorticoids, caffeine, and vitamin D. Using Massively Parallel Reporter Assays (MPRA) data, we validated the regulatory function of 21 introgressed Neanderthal variants in the human genome, corresponding to 8 eQTLs regulating 15 genes that respond to environmental perturbations. These findings expand the set of environments where archaic introgression may have contributed to adaptations to local environments in modern humans and provide experimental validation for the regulatory function of introgressed variants.
Collapse
Affiliation(s)
- Anthony S. Findley
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Xinjun Zhang
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
| | - Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Yen Lung Lin
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Cynthia A. Kalita
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Luis Barreiro
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California, United States of America
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|
18
|
Wang Y, Shi FY, Liang Y, Gao G. REVA as A Well-curated Database for Human Expression-modulating Variants. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:590-601. [PMID: 34224878 PMCID: PMC9040024 DOI: 10.1016/j.gpb.2021.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 06/22/2021] [Accepted: 06/25/2021] [Indexed: 10/25/2022]
Abstract
More than 90% of disease- and trait-associated human variants are noncoding. By systematically screening multiple large-scale studies, we compiled REVA, a manually curated database for over 11.8 million experimentally tested noncoding variants with expression-modulating potentials. We provided 2424 functional annotations that could be used to pinpoint the plausible regulatory mechanism of these variants. We further benchmarked multiple state-of-the-art computational tools and found their limited sensitivity remains a serious challenge for effective large-scale analysis. REVA provides high-quality experimentally tested expression-modulating variants with extensive functional annotations, which will be useful for users in the noncoding variants community. REVA is available at http://reva.gao-lab.org.
Collapse
Affiliation(s)
- Yu Wang
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Fang-Yuan Shi
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China.
| |
Collapse
|
19
|
Degtyareva AO, Antontseva EV, Merkulova TI. Regulatory SNPs: Altered Transcription Factor Binding Sites Implicated in Complex Traits and Diseases. Int J Mol Sci 2021; 22:6454. [PMID: 34208629 PMCID: PMC8235176 DOI: 10.3390/ijms22126454] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/15/2021] [Accepted: 06/15/2021] [Indexed: 12/19/2022] Open
Abstract
The vast majority of the genetic variants (mainly SNPs) associated with various human traits and diseases map to a noncoding part of the genome and are enriched in its regulatory compartment, suggesting that many causal variants may affect gene expression. The leading mechanism of action of these SNPs consists in the alterations in the transcription factor binding via creation or disruption of transcription factor binding sites (TFBSs) or some change in the affinity of these regulatory proteins to their cognate sites. In this review, we first focus on the history of the discovery of regulatory SNPs (rSNPs) and systematized description of the existing methodical approaches to their study. Then, we brief the recent comprehensive examples of rSNPs studied from the discovery of the changes in the TFBS sequence as a result of a nucleotide substitution to identification of its effect on the target gene expression and, eventually, to phenotype. We also describe state-of-the-art genome-wide approaches to identification of regulatory variants, including both making molecular sense of genome-wide association studies (GWAS) and the alternative approaches the primary goal of which is to determine the functionality of genetic variants. Among these approaches, special attention is paid to expression quantitative trait loci (eQTLs) analysis and the search for allele-specific events in RNA-seq (ASE events) as well as in ChIP-seq, DNase-seq, and ATAC-seq (ASB events) data.
Collapse
Affiliation(s)
- Arina O. Degtyareva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Elena V. Antontseva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Tatiana I. Merkulova
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
20
|
Örd T, Õunap K, Stolze LK, Aherrahrou R, Nurminen V, Toropainen A, Selvarajan I, Lönnberg T, Aavik E, Ylä-Herttuala S, Civelek M, Romanoski CE, Kaikkonen MU. Single-Cell Epigenomics and Functional Fine-Mapping of Atherosclerosis GWAS Loci. Circ Res 2021; 129:240-258. [PMID: 34024118 PMCID: PMC8260472 DOI: 10.1161/circresaha.121.318971] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Supplemental Digital Content is available in the text. Genome-wide association studies have identified hundreds of loci associated with coronary artery disease (CAD). Many of these loci are enriched in cisregulatory elements but not linked to cardiometabolic risk factors nor to candidate causal genes, complicating their functional interpretation.
Collapse
Affiliation(s)
- Tiit Örd
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Kadri Õunap
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Lindsey K. Stolze
- Department of Cellular and Molecular Medicine, The College of Medicine, The University of Arizona, Tucson, AZ (L.K.S., C.E.R.)
| | - Redouane Aherrahrou
- Center for Public Health Genomics (R.A., M.C.), University of Virginia, Charlottesville
| | - Valtteri Nurminen
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Anu Toropainen
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Ilakya Selvarajan
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Tapio Lönnberg
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Finland (T.L.)
| | - Einari Aavik
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Seppo Ylä-Herttuala
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| | - Mete Civelek
- Center for Public Health Genomics (R.A., M.C.), University of Virginia, Charlottesville
- Department of Biomedical Engineering (M.C.), University of Virginia, Charlottesville
| | - Casey E. Romanoski
- Department of Cellular and Molecular Medicine, The College of Medicine, The University of Arizona, Tucson, AZ (L.K.S., C.E.R.)
| | - Minna U. Kaikkonen
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio (T.Ö., K.Õ., V.N., A.T., I.S., E.A., S.Y.-H., M.U.K.)
| |
Collapse
|
21
|
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease. Am J Hum Genet 2021; 108:411-430. [PMID: 33626337 DOI: 10.1016/j.ajhg.2021.02.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/04/2021] [Indexed: 02/08/2023] Open
Abstract
Genetic factors underlying coronary artery disease (CAD) have been widely studied using genome-wide association studies (GWASs). However, the functional understanding of the CAD loci has been limited by the fact that a majority of GWAS variants are located within non-coding regions with no functional role. High cholesterol and dysregulation of the liver metabolism such as non-alcoholic fatty liver disease confer an increased risk of CAD. Here, we studied the function of non-coding single-nucleotide polymorphisms in CAD GWAS loci located within liver-specific enhancer elements by identifying their potential target genes using liver cis-eQTL analysis and promoter Capture Hi-C in HepG2 cells. Altogether, 734 target genes were identified of which 121 exhibited correlations to liver-related traits. To identify potentially causal regulatory SNPs, the allele-specific enhancer activity was analyzed by (1) sequence-based computational predictions, (2) quantification of allele-specific transcription factor binding, and (3) STARR-seq massively parallel reporter assay. Altogether, our analysis identified 1,277 unique SNPs that display allele-specific regulatory activity. Among these, susceptibility enhancers near important cholesterol homeostasis genes (APOB, APOC1, APOE, and LIPA) were identified, suggesting that altered gene regulatory activity could represent another way by which genetic variation regulates serum lipoprotein levels. Using CRISPR-based perturbation, we demonstrate how the deletion/activation of a single enhancer leads to changes in the expression of many target genes located in a shared chromatin interaction domain. Our integrative genomics approach represents a comprehensive effort in identifying putative causal regulatory regions and target genes that could predispose to clinical manifestation of CAD by affecting liver function.
Collapse
|
22
|
Promoter-interacting expression quantitative trait loci are enriched for functional genetic variants. Nat Genet 2021; 53:110-119. [PMID: 33349701 PMCID: PMC8053422 DOI: 10.1038/s41588-020-00745-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 11/02/2020] [Indexed: 01/28/2023]
Abstract
Expression quantitative trait loci (eQTLs) studies provide associations of genetic variants with gene expression but fall short of pinpointing functionally important eQTLs. Here, using H3K27ac HiChIP assays, we mapped eQTLs overlapping active cis-regulatory elements that interact with their target gene promoters (promoter-interacting eQTLs, pieQTLs) in five common immune cell types (Database of Immune Cell Expression, Expression quantitative trait loci and Epigenomics (DICE) cis-interactome project). This approach allowed us to identify functionally important eQTLs and show mechanisms that explain their cell-type restriction. We also devised an approach to eQTL discovery that relies on HiChIP-based promoter interaction maps as a structural framework for deciding which SNPs to test for association with gene expression, and observe ultra-long-distance pieQTLs (>1 megabase away), including several disease-risk variants. We validated the functional role of pieQTLs using reporter assays, CRISPRi, dCas9-tiling guides and Cas9-mediated base-pair editing. In this article we present a method for functional eQTL discovery and provide insights into relevance of noncoding variants for cell-specific gene regulation and for disease association beyond conventional eQTL mapping.
Collapse
|
23
|
Tian R, Pan Y, Etheridge THA, Deshmukh H, Gulick D, Gibson G, Bao G, Lee CM. Pitfalls in Single Clone CRISPR-Cas9 Mutagenesis to Fine-map Regulatory Intervals. Genes (Basel) 2020; 11:E504. [PMID: 32375333 PMCID: PMC7288657 DOI: 10.3390/genes11050504] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 04/15/2020] [Accepted: 04/22/2020] [Indexed: 12/11/2022] Open
Abstract
The majority of genetic variants affecting complex traits map to regulatory regions of genes, and typically lie in credible intervals of 100 or more SNPs. Fine mapping of the causal variant(s) at a locus depends on assays that are able to discriminate the effects of polymorphisms or mutations on gene expression. Here, we evaluated a moderate-throughput CRISPR-Cas9 mutagenesis approach, based on replicated measurement of transcript abundance in single-cell clones, by deleting candidate regulatory SNPs, affecting four genes known to be affected by large-effect expression Quantitative Trait Loci (eQTL) in leukocytes, and using Fluidigm qRT-PCR to monitor gene expression in HL60 pro-myeloid human cells. We concluded that there were multiple constraints that rendered the approach generally infeasible for fine mapping. These included the non-targetability of many regulatory SNPs, clonal variability of single-cell derivatives, and expense. Power calculations based on the measured variance attributable to major sources of experimental error indicated that typical eQTL explaining 10% of the variation in expression of a gene would usually require at least eight biological replicates of each clone. Scanning across credible intervals with this approach is not recommended.
Collapse
Affiliation(s)
- Ruoyu Tian
- Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA; (R.T.); (D.G.)
| | - Yidan Pan
- Systems, Synthetic, and Physical Biology, Rice University, Houston, TX 77005, USA;
- Department of Bioengineering, Rice University, Houston, TX 77005, USA; (H.D.); (T.H.A.E.)
| | - Thomas H. A. Etheridge
- Department of Bioengineering, Rice University, Houston, TX 77005, USA; (H.D.); (T.H.A.E.)
| | - Harshavardhan Deshmukh
- Department of Bioengineering, Rice University, Houston, TX 77005, USA; (H.D.); (T.H.A.E.)
| | - Dalia Gulick
- Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA; (R.T.); (D.G.)
| | - Greg Gibson
- Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA; (R.T.); (D.G.)
| | - Gang Bao
- Systems, Synthetic, and Physical Biology, Rice University, Houston, TX 77005, USA;
- Department of Bioengineering, Rice University, Houston, TX 77005, USA; (H.D.); (T.H.A.E.)
| | - Ciaran M Lee
- APC Microbiome Ireland, University College Cork, Cork T12 YN60, Ireland
| |
Collapse
|
24
|
Mattis KK, Gloyn AL. From Genetic Association to Molecular Mechanisms for Islet-cell Dysfunction in Type 2 Diabetes. J Mol Biol 2020; 432:1551-1578. [PMID: 31945378 DOI: 10.1016/j.jmb.2019.12.045] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/15/2019] [Accepted: 12/17/2019] [Indexed: 12/30/2022]
Abstract
Genome-wide association studies (GWAS) have identified over 400 signals robustly associated with risk for type 2 diabetes (T2D). At the vast majority of these loci, the lead single nucleotide polymorphisms (SNPs) reside in noncoding regions of the genome, which hampers biological inference and translation of genetic discoveries into disease mechanisms. The study of these T2D risk variants in normoglycemic individuals has revealed that a significant proportion are exerting their disease risk through islet-cell dysfunction. The central role of the islet is also demonstrated by numerous studies, which have shown an enrichment of these signals in islet-specific epigenomic annotations. In recent years the emergence of authentic human beta-cell lines, and advances in genome-editing technologies coupled with improved protocols differentiating human pluripotent stem cells into beta-like cells has opened up new opportunities for T2D disease modeling. Here we review the current understanding on the genetic basis of T2D focusing on approaches, which have facilitated the identification of causal variants and their effector transcripts in human islets. We will present examples of functional studies based on animal and conventional cellular systems and highlight the potential of novel stem cell-based T2D disease models.
Collapse
Affiliation(s)
- Katia K Mattis
- Oxford Centre for Diabetes Endocrinology & Metabolism, University of Oxford, UK
| | - Anna L Gloyn
- Oxford Centre for Diabetes Endocrinology & Metabolism, University of Oxford, UK; Wellcome Trust Centre for Human Genetics, University of Oxford, UK; National Institute of Health Research, Biomedical Research Centre, Churchill Hospital, Headington, Oxford, UK.
| |
Collapse
|
25
|
Santana-Garcia W, Rocha-Acevedo M, Ramirez-Navarro L, Mbouamboua Y, Thieffry D, Thomas-Chollier M, Contreras-Moreira B, van Helden J, Medina-Rivera A. RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding. Comput Struct Biotechnol J 2019; 17:1415-1428. [PMID: 31871587 PMCID: PMC6906655 DOI: 10.1016/j.csbj.2019.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 09/22/2019] [Accepted: 09/25/2019] [Indexed: 02/06/2023] Open
Abstract
Gene regulatory regions contain short and degenerated DNA binding sites recognized by transcription factors (TFBS). When TFBS harbor SNPs, the DNA binding site may be affected, thereby altering the transcriptional regulation of the target genes. Such regulatory SNPs have been implicated as causal variants in Genome-Wide Association Study (GWAS) studies. In this study, we describe improved versions of the programs Variation-tools designed to predict regulatory variants, and present four case studies to illustrate their usage and applications. In brief, Variation-tools facilitate i) obtaining variation information, ii) interconversion of variation file formats, iii) retrieval of sequences surrounding variants, and iv) calculating the change on predicted transcription factor affinity scores between alleles, using motif scanning approaches. Notably, the tools support the analysis of haplotypes. The tools are included within the well-maintained suite Regulatory Sequence Analysis Tools (RSAT, http://rsat.eu), and accessible through a web interface that currently enables analysis of five metazoa and ten plant genomes. Variation-tools can also be used in command-line with any locally-installed Ensembl genome. Users can input personal collections of variants and motifs, providing flexibility in the analysis.
Collapse
Key Words
- Binding motifs
- CEU, Northern Europeans from Utah
- CRM, Cis-Regulatory Module
- GWAS, Genome Wide Association Studies
- LD, Linkage Disequilibrium
- MPRA, Massively Parallel Reporter Assays: MPRA
- PSSM, Position Specific Scoring Matrix
- Position specific scoring matrix
- ROC, Receiver Operating Characteristic
- RSAT, Regulatory Sequence Analysis Tools
- Regulatory variants
- SNP, Single Nucleotide Polymorphism
- SNPs
- SOIs, SNPs of Interest
- TF, Transcription Factor
- TFBS, Transcription Factor Binding Site
- Transcription factors
- eQTL, Expression Quantitative Trait Loci
- rsID, Reference SNP Identifier
Collapse
Affiliation(s)
- Walter Santana-Garcia
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Maria Rocha-Acevedo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Lucia Ramirez-Navarro
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Yvon Mbouamboua
- Fondation Congolaise pour la Recherche Médicale, Brazzaville, People’s Republic of Congo
- Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
| | - Denis Thieffry
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Morgane Thomas-Chollier
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | | | - Jacques van Helden
- Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
- Corresponding authors at: Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México (Medina-Rivera). Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France (J. van Heldenf).
| | - Alejandra Medina-Rivera
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
- Corresponding authors at: Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México (Medina-Rivera). Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France (J. van Heldenf).
| |
Collapse
|
26
|
Majoros WH, Kim YS, Barrera A, Li F, Wang X, Cunningham SJ, Johnson GD, Guo C, Lowe WL, Scholtens DM, Hayes MG, Reddy TE, Allen AS. Bayesian estimation of genetic regulatory effects in high-throughput reporter assays. Bioinformatics 2019; 36:331-338. [PMID: 31368479 PMCID: PMC7999138 DOI: 10.1093/bioinformatics/btz545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 06/12/2019] [Accepted: 07/24/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- William H Majoros
- Duke Center for Statistical Genetics and Genomics, Duke University,Division of Integrative Genomics, Department of Biostatistics and Bioinformatics, Duke University Medical School,Center for Genomic and Computational Biology, Duke University Medical School
| | - Young-Sook Kim
- Center for Genomic and Computational Biology, Duke University Medical School,Program in Computational Biology & Bioinformatics, Duke University, Durham, NC 27710
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University Medical School
| | - Fan Li
- Department of Biostatistics, Yale University, New Haven, CT 06520
| | - Xingyan Wang
- Present address: PhD Program in Biostatistics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033, USA
| | | | - Graham D Johnson
- Center for Genomic and Computational Biology, Duke University Medical School,Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710
| | - Cong Guo
- Present address: Human Genetics, GlaxoSmithKline, Collegeville, PA 19426, USA
| | - William L Lowe
- Division of Endocrinology Metabolism and Molecular Medicine, Northwestern University Feinberg School of Medicine, Chicago
| | - Denise M Scholtens
- Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - M Geoffrey Hayes
- Division of Endocrinology Metabolism and Molecular Medicine, Northwestern University Feinberg School of Medicine, Chicago
| | | | | |
Collapse
|