1
|
Reyna J, Fetter K, Ignacio R, Marandi CCA, Rao N, Jiang Z, Figueroa DS, Bhattacharyya S, Ay F. Loop Catalog: a comprehensive HiChIP database of human and mouse samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.591349. [PMID: 38746164 PMCID: PMC11092438 DOI: 10.1101/2024.04.26.591349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
HiChIP enables cost-effective and high-resolution profiling of regulatory and structural loops. To leverage the increasing number of publicly available HiChIP datasets from diverse cell lines and primary cells, we developed the Loop Catalog (https://loopcatalog.lji.org), a web-based database featuring HiChIP loop calls for 1319 samples across 133 studies and 44 high-resolution Hi-C loop calls. We demonstrate its utility in interpreting fine-mapped GWAS variants (SNP-to-gene linking), in identifying enriched sequence motifs and motif pairs at loop anchors, and in network-level analysis of loops connecting regulatory elements (community detection). Our comprehensive catalog, spanning over 4M unique 5kb loops, along with the accompanying analysis modalities constitutes an important resource for studies in gene regulation and genome organization.
Collapse
Affiliation(s)
- Joaquin Reyna
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
| | - Kyra Fetter
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093 USA
| | - Romeo Ignacio
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
| | - Cemil Can Ali Marandi
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
| | - Nikhil Rao
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093 USA
| | - Zichen Jiang
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093 USA
| | - Daniela Salgado Figueroa
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
| | - Sourya Bhattacharyya
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
| | - Ferhat Ay
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA 92093 USA
| |
Collapse
|
2
|
Gao Z, Liu Q, Zeng W, Jiang R, Wong WH. EpiGePT: a Pretrained Transformer model for epigenomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.15.549134. [PMID: 37502861 PMCID: PMC10370089 DOI: 10.1101/2023.07.15.549134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The inherent similarities between natural language and biological sequences have given rise to great interest in adapting the transformer-based large language models (LLMs) underlying recent breakthroughs in natural language processing (references), for applications in genomics. However, current LLMs for genomics suffer from several limitations such as the inability to include chromatin interactions in the training data, and the inability to make prediction in new cellular contexts not represented in the training data. To mitigate these problems, we propose EpiGePT, a transformer-based pretrained language model for predicting context-specific epigenomic signals and chromatin contacts. By taking the context-specific activities of transcription factors (TFs) and 3D genome interactions into consideration, EpiGePT offers wider applicability and deeper biological insights than models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates superior performance in a diverse set of epigenomic signals prediction tasks when compared to existing methods. In particular, our model enables cross-cell-type prediction of long-range interactions and offers insight on the functional impact of genetic variants under different cellular contexts. These new capabilities will enhance the usefulness of LLM in the study of gene regulatory mechanisms. We provide free online prediction service of EpiGePT through http://health.tsinghua.edu.cn/epigept/.
Collapse
Affiliation(s)
- Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiao Liu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Wanwen Zeng
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Bio-X Program, Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Shook MS, Lu X, Chen X, Parameswaran S, Edsall L, Trimarchi MP, Ernst K, Granitto M, Forney C, Donmez OA, Diouf AA, VonHandorf A, Rothenberg ME, Weirauch MT, Kottyan LC. Systematic identification of genotype-dependent enhancer variants in eosinophilic esophagitis. Am J Hum Genet 2024; 111:280-294. [PMID: 38183988 PMCID: PMC10870143 DOI: 10.1016/j.ajhg.2023.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 12/01/2023] [Accepted: 12/05/2023] [Indexed: 01/08/2024] Open
Abstract
Eosinophilic esophagitis (EoE) is a rare atopic disorder associated with esophageal dysfunction, including difficulty swallowing, food impaction, and inflammation, that develops in a small subset of people with food allergies. Genome-wide association studies (GWASs) have identified 9 independent EoE risk loci reaching genome-wide significance (p < 5 × 10-8) and 27 additional loci of suggestive significance (5 × 10-8 < p < 1 × 10-5). In the current study, we perform linkage disequilibrium (LD) expansion of these loci to nominate a set of 531 variants that are potentially causal. To systematically interrogate the gene regulatory activity of these variants, we designed a massively parallel reporter assay (MPRA) containing the alleles of each variant within their genomic sequence context cloned into a GFP reporter library. Analysis of reporter gene expression in TE-7, HaCaT, and Jurkat cells revealed cell-type-specific gene regulation. We identify 32 allelic enhancer variants, representing 6 genome-wide significant EoE loci and 7 suggestive EoE loci, that regulate reporter gene expression in a genotype-dependent manner in at least one cellular context. By annotating these variants with expression quantitative trait loci (eQTL) and chromatin looping data in related tissues and cell types, we identify putative target genes affected by genetic variation in individuals with EoE. Transcription factor enrichment analyses reveal possible roles for cell-type-specific regulators, including GATA3. Our approach reduces the large set of EoE-associated variants to a set of 32 with allelic regulatory activity, providing functional insights into the effects of genetic variation in this disease.
Collapse
Affiliation(s)
- Molly S Shook
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Xiaoming Lu
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Xiaoting Chen
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Lee Edsall
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Michael P Trimarchi
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Kevin Ernst
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Marissa Granitto
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Carmy Forney
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Omer A Donmez
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Arame A Diouf
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Andrew VonHandorf
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Marc E Rothenberg
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
| | - Leah C Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
| |
Collapse
|
4
|
Agarwal A, Korsak S, Choudhury A, Plewczynski D. The dynamic role of cohesin in maintaining human genome architecture. Bioessays 2023; 45:e2200240. [PMID: 37603403 DOI: 10.1002/bies.202200240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 08/03/2023] [Accepted: 08/07/2023] [Indexed: 08/22/2023]
Abstract
Recent advances in genomic and imaging techniques have revealed the complex manner of organizing billions of base pairs of DNA necessary for maintaining their functionality and ensuring the proper expression of genetic information. The SMC proteins and cohesin complex primarily contribute to forming higher-order chromatin structures, such as chromosomal territories, compartments, topologically associating domains (TADs) and chromatin loops anchored by CCCTC-binding factor (CTCF) protein or other genome organizers. Cohesin plays a fundamental role in chromatin organization, gene expression and regulation. This review aims to describe the current understanding of the dynamic nature of the cohesin-DNA complex and its dependence on cohesin for genome maintenance. We discuss the current 3C technique and numerous bioinformatics pipelines used to comprehend structural genomics and epigenetics focusing on the analysis of Cohesin-centred interactions. We also incorporate our present comprehension of Loop Extrusion (LE) and insights from stochastic modelling.
Collapse
Affiliation(s)
- Abhishek Agarwal
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Sevastianos Korsak
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | | | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
5
|
Feng Z, Duren Z, Xin J, Yuan Q, He Y, Su B, Wong WH, Wang Y. Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification. eLife 2022; 11:82535. [PMID: 36525361 PMCID: PMC9810332 DOI: 10.7554/elife.82535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.
Collapse
Affiliation(s)
- Zhanying Feng
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of SciencesBeijingChina
- School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of SciencesBeijingChina
| | - Zhana Duren
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson UniversityGreenwoodUnited States
| | - Jingxue Xin
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford UniversityStanfordUnited States
| | - Qiuyue Yuan
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson UniversityGreenwoodUnited States
| | - Yaoxi He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of SciencesKunmingChina
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of SciencesKunmingChina
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of SciencesKunmingChina
| | - Wing Hung Wong
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford UniversityStanfordUnited States
| | - Yong Wang
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of SciencesBeijingChina
- School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of SciencesBeijingChina
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of SciencesKunmingChina
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of SciencesHangzhouChina
| |
Collapse
|