1
|
Liu W, Zhong W, Giusti-Rodríguez P, Jiang Z, Wang GW, Sun H, Hu M, Li Y. SnapHiC-G: identifying long-range enhancer-promoter interactions from single-cell Hi-C data via a global background model. Brief Bioinform 2024; 25:bbae426. [PMID: 39222061 PMCID: PMC11367764 DOI: 10.1093/bib/bbae426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 07/05/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024] Open
Abstract
Harnessing the power of single-cell genomics technologies, single-cell Hi-C (scHi-C) and its derived technologies provide powerful tools to measure spatial proximity between regulatory elements and their target genes in individual cells. Using a global background model, we propose SnapHiC-G, a computational method, to identify long-range enhancer-promoter interactions from scHi-C data. We applied SnapHiC-G to scHi-C datasets generated from mouse embryonic stem cells and human brain cortical cells. SnapHiC-G achieved high sensitivity in identifying long-range enhancer-promoter interactions. Moreover, SnapHiC-G can identify putative target genes for noncoding genome-wide association study (GWAS) variants, and the genetic heritability of neuropsychiatric diseases is enriched for single-nucleotide polymorphisms (SNPs) within SnapHiC-G-identified interactions in a cell-type-specific manner. In sum, SnapHiC-G is a powerful tool for characterizing cell-type-specific enhancer-promoter interactions from complex tissues and can facilitate the discovery of chromatin interactions important for gene regulation in biologically relevant cell types.
Collapse
Affiliation(s)
- Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, United States
| | - Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., 126 East Lincoln Ave, Rahway, New Jersey 07065, United States
| | - Paola Giusti-Rodríguez
- Department of Psychiatry, University of Florida, 1149 Newel Dr., Gainesville, FL 32611, United States
| | - Zhiyun Jiang
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
| | - Geoffery W Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, United States
| | - Huaigu Sun
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland, OH 44196, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, United States
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, 201 S. Columbia St, Chapel Hill, NC 27599, United States
| |
Collapse
|
2
|
Zhang Y, Cameron CJF, Blanchette M. Posterior inference of Hi-C contact frequency through sampling. FRONTIERS IN BIOINFORMATICS 2024; 3:1285828. [PMID: 38455089 PMCID: PMC10919286 DOI: 10.3389/fbinf.2023.1285828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/20/2023] [Indexed: 03/09/2024] Open
Abstract
Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.
Collapse
Affiliation(s)
- Yanlin Zhang
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Christopher J. F. Cameron
- School of Computer Science, McGill University, Montréal, QC, Canada
- Department of Biochemistry and Goodman Cancer Research Center, McGill University, Montreal, QC, Canada
| | | |
Collapse
|
3
|
Raffo A, Paulsen J. The shape of chromatin: insights from computational recognition of geometric patterns in Hi-C data. Brief Bioinform 2023; 24:bbad302. [PMID: 37646128 PMCID: PMC10516369 DOI: 10.1093/bib/bbad302] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/05/2023] [Accepted: 08/03/2023] [Indexed: 09/01/2023] Open
Abstract
The three-dimensional organization of chromatin plays a crucial role in gene regulation and cellular processes like deoxyribonucleic acid (DNA) transcription, replication and repair. Hi-C and related techniques provide detailed views of spatial proximities within the nucleus. However, data analysis is challenging partially due to a lack of well-defined, underpinning mathematical frameworks. Recently, recognizing and analyzing geometric patterns in Hi-C data has emerged as a powerful approach. This review provides a summary of algorithms for automatic recognition and analysis of geometric patterns in Hi-C data and their correspondence with chromatin structure. We classify existing algorithms on the basis of the data representation and pattern recognition paradigm they make use of. Finally, we outline some of the challenges ahead and promising future directions.
Collapse
Affiliation(s)
- Andrea Raffo
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
4
|
Schuette G, Ding X, Zhang B. Efficient Hi-C inversion facilitates chromatin folding mechanism discovery and structure prediction. Biophys J 2023; 122:3425-3438. [PMID: 37496267 PMCID: PMC10502442 DOI: 10.1016/j.bpj.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 07/10/2023] [Accepted: 07/24/2023] [Indexed: 07/28/2023] Open
Abstract
Genome-wide chromosome conformation capture (Hi-C) experiments have revealed many structural features of chromatin across multiple length scales. Further understanding genome organization requires relating these discoveries to the mechanisms that establish chromatin structures and reconstructing these structures in three dimensions, but both objectives are difficult to achieve with existing algorithms that are often computationally expensive. To alleviate this challenge, we present an algorithm that efficiently converts Hi-C data into contact energies, which measure the interaction strength between genomic loci brought into proximity. Contact energies are local quantities unaffected by the topological constraints that correlate Hi-C contact probabilities. Thus, extracting contact energies from Hi-C contact probabilities distills the biologically unique information contained in the data. We show that contact energies reveal the location of chromatin loop anchors, support a phase separation mechanism for genome compartmentalization, and parameterize polymer simulations that predict three-dimensional chromatin structures. Therefore, we anticipate that contact energy extraction will unleash the full potential of Hi-C data and that our inversion algorithm will facilitate the widespread adoption of contact energy analysis.
Collapse
Affiliation(s)
- Greg Schuette
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts.
| |
Collapse
|
5
|
Schuette G, Ding X, Zhang B. Efficient Hi-C inversion facilitates chromatin folding mechanism discovery and structure prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.17.533194. [PMID: 36993500 PMCID: PMC10055272 DOI: 10.1101/2023.03.17.533194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Genome-wide chromosome conformation capture (Hi-C) experiments have revealed many structural features of chromatin across multiple length scales. Further understanding genome organization requires relating these discoveries to the mechanisms that establish chromatin structures and reconstructing these structures in three dimensions, but both objectives are difficult to achieve with existing algorithms that are often computationally expensive. To alleviate this challenge, we present an algorithm that efficiently converts Hi-C data into contact energies, which measure the interaction strength between genomic loci brought into proximity. Contact energies are local quantities unaffected by the topological constraints that correlate Hi-C contact probabilities. Thus, extracting contact energies from Hi-C contact probabilities distills the biologically unique information contained in the data. We show that contact energies reveal the location of chromatin loop anchors, support a phase separation mechanism for genome compartmentalization, and parameterize polymer simulations that predict three-dimensional chromatin structures. Therefore, we anticipate that contact energy extraction will unleash the full potential of Hi-C data and that our inversion algorithm will facilitate the widespread adoption of contact energy analysis.
Collapse
Affiliation(s)
- Greg Schuette
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
6
|
Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, Sullivan PF, Hindle AG, Andrews G, Armstrong JC, Bianchi M, Breit AM, Diekhans M, Fanter C, Foley NM, Goodman DB, Goodman L, Keough KC, Kirilenko B, Kowalczyk A, Lawless C, Lind AL, Meadows JRS, Moreira LR, Redlich RW, Ryan L, Swofford R, Valenzuela A, Wagner F, Wallerman O, Brown AR, Damas J, Fan K, Gatesy J, Grimshaw J, Johnson J, Kozyrev SV, Lawler AJ, Marinescu VD, Morrill KM, Osmanski A, Paulat NS, Phan BN, Reilly SK, Schäffer DE, Steiner C, Supple MA, Wilder AP, Wirthlin ME, Xue JR, Zoonomia Consortium, Birren BW, Gazal S, Hubley RM, Koepfli KP, Marques-Bonet T, Meyer WK, Nweeia M, Sabeti PC, Shapiro B, Smit AFA, Springer MS, Teeling EC, Weng Z, Hiller M, Levesque DL, Lewin HA, Murphy WJ, Navarro A, Paten B, Pollard KS, Ray DA, Ruf I, Ryder OA, Pfenning AR, Lindblad-Toh K, Karlsson EK. Evolutionary constraint and innovation across hundreds of placental mammals. Science 2023; 380:eabn3943. [PMID: 37104599 PMCID: PMC10250106 DOI: 10.1126/science.abn3943] [Citation(s) in RCA: 106] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 12/16/2022] [Indexed: 04/29/2023]
Abstract
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Collapse
Affiliation(s)
- Matthew J. Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Irene M. Kaplow
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Michael X. Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Allyson G. Hindle
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Joel C. Armstrong
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ana M. Breit
- School of Biology and Ecology, University of Maine, Orono, ME 04469, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cornelia Fanter
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Nicole M. Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Daniel B. Goodman
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA 94143, USA
| | | | - Kathleen C. Keough
- Fauna Bio, Inc., Emeryville, CA 94608, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Bogdan Kirilenko
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | - Amanda Kowalczyk
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Colleen Lawless
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Abigail L. Lind
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Jennifer R. S. Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Lucas R. Moreira
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Ruby W. Redlich
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Louise Ryan
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Alejandro Valenzuela
- Department of Experimental and Health Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Franziska Wagner
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ashley R. Brown
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Jenna Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Jeremy Johnson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Sergey V. Kozyrev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Alyssa J. Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Voichita D. Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Kathleen M. Morrill
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Austin Osmanski
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Nicole S. Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - BaDoi N. Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Daniel E. Schäffer
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A. Supple
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Aryn P. Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Morgan E. Wirthlin
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - James R. Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | | | - Bruce W. Birren
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Steven Gazal
- Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | | | - Klaus-Peter Koepfli
- Center for Species Survival, Smithsonian’s National Zoo and Conservation Biology Institute, Washington, DC 20008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
| | - Tomas Marques-Bonet
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Wynn K. Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Martin Nweeia
- Department of Comprehensive Care, School of Dental Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Vertebrate Zoology, Canadian Museum of Nature, Ottawa, Ontario K2P 2R1, Canada
- Department of Vertebrate Zoology, Smithsonian Institution, Washington, DC 20002, USA
- Narwhal Genome Initiative, Department of Restorative Dentistry and Biomaterials Sciences, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Pardis C. Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Mark S. Springer
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Emma C. Teeling
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | | | - Harris A. Lewin
- The Genome Center, University of California Davis, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA 95616, USA
| | - William J. Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Arcadi Navarro
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, 08005 Barcelona, Spain
- CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Katherine S. Pollard
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - David A. Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Irina Ruf
- Division of Messel Research and Mammalogy, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt am Main, Germany
| | - Oliver A. Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, School of Biological Sciences, University of California San Diego, La Jolla, CA 92039, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K. Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
7
|
Kaplow IM, Lawler AJ, Schäffer DE, Srinivasan C, Sestili HH, Wirthlin ME, Phan BN, Prasad K, Brown AR, Zhang X, Foley K, Genereux DP, Zoonomia Consortium, Karlsson EK, Lindblad-Toh K, Meyer WK, Pfenning AR. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 2023; 380:eabm7993. [PMID: 37104615 PMCID: PMC10322212 DOI: 10.1126/science.abm7993] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 02/23/2023] [Indexed: 04/29/2023]
Abstract
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Collapse
Affiliation(s)
- Irene M. Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alyssa J. Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Daniel E. Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Chaitanya Srinivasan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Heather H. Sestili
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Morgan E. Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - BaDoi N. Phan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Kavya Prasad
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ashley R. Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kathleen Foley
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Diane P. Genereux
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | - Elinor K. Karlsson
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute, Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Wynn K. Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Zhang Y, Wang H, Liu J, Li J, Zhang Q, Tang B, Zhang Z. Delta.EPI: a probabilistic voting-based enhancer-promoter interaction prediction platform. J Genet Genomics 2023:S1673-8527(23)00045-0. [PMID: 36822264 DOI: 10.1016/j.jgg.2023.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/20/2023] [Accepted: 02/10/2023] [Indexed: 02/24/2023]
Abstract
Enhancer promoter interaction (EPI) involves most of gene transcriptional regulation in the high eukaryotes. Predicting the EPIs from given genomic loci or DNA sequences is not a trivial task. The benchmarking work so far for EPI predictors is more or less empirical and lacks quantitative model-based comparisons, posing challenges for molecular biologists to obtain reliable EPI predictions. Here, we present an EPI prediction platform, Delta.EPI. Based on a statistic model of the data integration, Delta.EPI is capable of comprehensively assessing the predictions from four state-of-the-art EPI predictors. Equipped with a user-friendly interface and visualization platform, Delta.EPI presents the sorted results with the confidence of EPI relevance, which may guide the molecular biologists who lack the pre-knowledge of the algorithms of EPI prediction. Last, we showcase the utility of Delta.EPI with a case study. Delta.EPI provides a powerful tool to fuel the gene regulation and 3D genome studies by ease-to-access EPI predictions. Delta.EPI can be freely access at https://ngdc.cncb.ac.cn/deltaEPI/.
Collapse
Affiliation(s)
- Yuyang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haoyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jing Liu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Junlin Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qing Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China.
| | - Bixia Tang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China.
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
9
|
Zhong W, Liu W, Chen J, Sun Q, Hu M, Li Y. Understanding the function of regulatory DNA interactions in the interpretation of non-coding GWAS variants. Front Cell Dev Biol 2022; 10:957292. [PMID: 36060805 PMCID: PMC9437546 DOI: 10.3389/fcell.2022.957292] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/21/2022] [Indexed: 01/11/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified a vast number of variants associated with various complex human diseases and traits. However, most of these GWAS variants reside in non-coding regions producing no proteins, making the interpretation of these variants a daunting challenge. Prior evidence indicates that a subset of non-coding variants detected within or near cis-regulatory elements (e.g., promoters, enhancers, silencers, and insulators) might play a key role in disease etiology by regulating gene expression. Advanced sequencing- and imaging-based technologies, together with powerful computational methods, enabling comprehensive characterization of regulatory DNA interactions, have substantially improved our understanding of the three-dimensional (3D) genome architecture. Recent literature witnesses plenty of examples where using chromosome conformation capture (3C)-based technologies successfully links non-coding variants to their target genes and prioritizes relevant tissues or cell types. These examples illustrate the critical capability of 3D genome organization in annotating non-coding GWAS variants. This review discusses how 3D genome organization information contributes to elucidating the potential roles of non-coding GWAS variants in disease etiology.
Collapse
Affiliation(s)
- Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co, Inc, Rahway, NJ, United States
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
10
|
Osuntoki IG, Harrison A, Dai H, Bao Y, Zabet NR. ZipHiC: a novel Bayesian framework to identify enriched interactions and experimental biases in Hi-C data. Bioinformatics 2022; 38:3523-3531. [PMID: 35678507 PMCID: PMC9272800 DOI: 10.1093/bioinformatics/btac387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 05/23/2022] [Accepted: 06/07/2022] [Indexed: 11/26/2022] Open
Abstract
Motivation Several computational and statistical methods have been developed to analyze data generated through the 3C-based methods, especially the Hi-C. Most of the existing methods do not account for dependency in Hi-C data. Results Here, we present ZipHiC, a novel statistical method to explore Hi-C data focusing on the detection of enriched contacts. ZipHiC implements a Bayesian method based on a hidden Markov random field (HMRF) model and the Approximate Bayesian Computation (ABC) to detect interactions in two-dimensional space based on a Hi-C contact frequency matrix. ZipHiC uses data on the sources of biases related to the contact frequency matrix, allows borrowing information from neighbours using the Potts model and improves computation speed using the ABC model. In addition to outperforming existing tools on both simulated and real data, our model also provides insights into different sources of biases that affects Hi-C data. We show that some datasets display higher biases from DNA accessibility or Transposable Elements content. Furthermore, our analysis in Drosophila melanogaster showed that approximately half of the detected significant interactions connect promoters with other parts of the genome indicating a functional biological role. Finally, we found that the micro-C datasets display higher biases from DNA accessibility compared to a similar Hi-C experiment, but this can be corrected by ZipHiC. Availability and implementation The R scripts are available at https://github.com/igosungithub/HMRFHiC.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Itunu G Osuntoki
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom.,Statistics, Modelling and Economics Department, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Andrew Harrison
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Hongsheng Dai
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Yanchun Bao
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom.,Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT, United Kingdom
| |
Collapse
|
11
|
Gong W, Wee J, Wu MC, Sun X, Li C, Xia K. Persistent spectral simplicial complex-based machine learning for chromosomal structural analysis in cellular differentiation. Brief Bioinform 2022; 23:6583209. [PMID: 35536545 DOI: 10.1093/bib/bbac168] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 04/12/2022] [Accepted: 03/13/2022] [Indexed: 11/13/2022] Open
Abstract
The three-dimensional (3D) chromosomal structure plays an essential role in all DNA-templated processes, including gene transcription, DNA replication and other cellular processes. Although developing chromosome conformation capture (3C) methods, such as Hi-C, which can generate chromosomal contact data characterized genome-wide chromosomal structural properties, understanding 3D genomic nature-based on Hi-C data remains lacking. Here, we propose a persistent spectral simplicial complex (PerSpectSC) model to describe Hi-C data for the first time. Specifically, a filtration process is introduced to generate a series of nested simplicial complexes at different scales. For each of these simplicial complexes, its spectral information can be calculated from the corresponding Hodge Laplacian matrix. PerSpectSC model describes the persistence and variation of the spectral information of the nested simplicial complexes during the filtration process. Different from all previous models, our PerSpectSC-based features provide a quantitative global-scale characterization of chromosome structures and topology. Our descriptors can successfully classify cell types and also cellular differentiation stages for all the 24 types of chromosomes simultaneously. In particular, persistent minimum best characterizes cell types and Dim (1) persistent multiplicity best characterizes cellular differentiation. These results demonstrate the great potential of our PerSpectSC-based models in polymeric data analysis.
Collapse
Affiliation(s)
- Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China 100124.,Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Min-Chun Wu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Xiaohan Sun
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China 100124
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China 100124
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
12
|
Rowland B, Huh R, Hou Z, Crowley C, Wen J, Shen Y, Hu M, Giusti-Rodríguez P, Sullivan PF, Li Y. THUNDER: A reference-free deconvolution method to infer cell type proportions from bulk Hi-C data. PLoS Genet 2022; 18:e1010102. [PMID: 35259165 PMCID: PMC8932604 DOI: 10.1371/journal.pgen.1010102] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 03/18/2022] [Accepted: 02/14/2022] [Indexed: 11/30/2022] Open
Abstract
Hi-C data provide population averaged estimates of three-dimensional chromatin contacts across cell types and states in bulk samples. Effective analysis of Hi-C data entails controlling for the potential confounding factor of differential cell type proportions across heterogeneous bulk samples. We propose a novel unsupervised deconvolution method for inferring cell type composition from bulk Hi-C data, the Two-step Hi-c UNsupervised DEconvolution appRoach (THUNDER). We conducted extensive simulations to test THUNDER based on combining two published single-cell Hi-C (scHi-C) datasets. THUNDER more accurately estimates the underlying cell type proportions compared to reference-free methods (e.g., TOAST, and NMF) and is more robust than reference-dependent methods (e.g. MuSiC). We further demonstrate the practical utility of THUNDER to estimate cell type proportions and identify cell-type-specific interactions in Hi-C data from adult human cortex tissue samples. THUNDER will be a useful tool in adjusting for varying cell type composition in population samples, facilitating valid and more powerful downstream analysis such as differential chromatin organization studies. Additionally, THUNDER estimated contact profiles provide a useful exploratory framework to investigate cell-type-specificity of the chromatin interactome while experimental data is still rare.
Collapse
Affiliation(s)
- Bryce Rowland
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Ruth Huh
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Zoey Hou
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois, United States of America
| | - Cheynna Crowley
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yin Shen
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Department of Neurology, University of California San Francisco, San Francisco, California, United States of America
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, Ohio, United States of America
| | - Paola Giusti-Rodríguez
- Department of Psychiatry, University of Florida College of Medicine, Gainesville, Florida, United States of America
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
13
|
Yu M, Abnousi A, Zhang Y, Li G, Lee L, Chen Z, Fang R, Lagler TM, Yang Y, Wen J, Sun Q, Li Y, Ren B, Hu M. SnapHiC: a computational pipeline to identify chromatin loops from single-cell Hi-C data. Nat Methods 2021; 18:1056-1059. [PMID: 34446921 PMCID: PMC8440170 DOI: 10.1038/s41592-021-01231-2] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 06/30/2021] [Indexed: 11/30/2022]
Abstract
Single-cell Hi-C (scHi-C) analysis has been increasingly used to map chromatin architecture in diverse tissue contexts, but computational tools to define chromatin loops at high resolution from scHi-C data are still lacking. Here, we describe Single-Nucleus Analysis Pipeline for Hi-C (SnapHiC), a method that can identify chromatin loops at high resolution and accuracy from scHi-C data. Using scHi-C data from 742 mouse embryonic stem cells, we benchmark SnapHiC against a number of computational tools developed for mapping chromatin loops and interactions from bulk Hi-C. We further demonstrate its use by analyzing single-nucleus methyl-3C-seq data from 2,869 human prefrontal cortical cells, which uncovers cell type-specific chromatin loops and predicts putative target genes for noncoding sequence variants associated with neuropsychiatric disorders. Our results indicate that SnapHiC could facilitate the analysis of cell type-specific chromatin architecture and gene regulatory programs in complex tissues. SnapHiC offers a computational tool for improving detection of chromatin loops from single-cell Hi-C data.
Collapse
Affiliation(s)
- Miao Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China.,Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Guoqiang Li
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Ziyin Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Rongxin Fang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.,Howard Hughes Medical Institute, Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Taylor M Lagler
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Yuchen Yang
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC, USA.,McAllister Heart Institute, University of North Carolina, Chapel Hill, NC, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.,Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, USA. .,Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA.
| |
Collapse
|
14
|
Liu N, Low WY, Alinejad-Rokny H, Pederson S, Sadlon T, Barry S, Breen J. Seeing the forest through the trees: prioritising potentially functional interactions from Hi-C. Epigenetics Chromatin 2021; 14:41. [PMID: 34454581 PMCID: PMC8399707 DOI: 10.1186/s13072-021-00417-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/19/2021] [Indexed: 11/30/2022] Open
Abstract
Eukaryotic genomes are highly organised within the nucleus of a cell, allowing widely dispersed regulatory elements such as enhancers to interact with gene promoters through physical contacts in three-dimensional space. Recent chromosome conformation capture methodologies such as Hi-C have enabled the analysis of interacting regions of the genome providing a valuable insight into the three-dimensional organisation of the chromatin in the nucleus, including chromosome compartmentalisation and gene expression. Complicating the analysis of Hi-C data, however, is the massive amount of identified interactions, many of which do not directly drive gene function, thus hindering the identification of potentially biologically functional 3D interactions. In this review, we collate and examine the downstream analysis of Hi-C data with particular focus on methods that prioritise potentially functional interactions. We classify three groups of approaches: structural-based discovery methods, e.g. A/B compartments and topologically associated domains, detection of statistically significant chromatin interactions, and the use of epigenomic data integration to narrow down useful interaction information. Careful use of these three approaches is crucial to successfully identifying potentially functional interactions within the genome.
Collapse
Affiliation(s)
- Ning Liu
- Computational & Systems Biology, Precision Medicine Theme, South Australian Health & Medical Research Institute, SA, 5000, Adelaide, Australia
- Robinson Research Institute, University of Adelaide, SA, 5005, Adelaide, Australia
- Adelaide Medical School, University of Adelaide, SA, 5005, Adelaide, Australia
| | - Wai Yee Low
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, SA, 5371, Australia
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, The University of New South Wales, NSW, 2052, Sydney, Australia
- Core Member of UNSW Data Science Hub, The University of New South Wales, 2052, Sydney, Australia
| | - Stephen Pederson
- Adelaide Medical School, University of Adelaide, SA, 5005, Adelaide, Australia
- Dame Roma Mitchell Cancer Research Laboratories (DRMCRL), Adelaide Medical School, University of Adelaide, SA, 5005, Adelaide, Australia
| | - Timothy Sadlon
- Robinson Research Institute, University of Adelaide, SA, 5005, Adelaide, Australia
- Women's & Children's Health Network, SA, 5006, North Adelaide, Australia
| | - Simon Barry
- Robinson Research Institute, University of Adelaide, SA, 5005, Adelaide, Australia
- Core Member of UNSW Data Science Hub, The University of New South Wales, 2052, Sydney, Australia
- Women's & Children's Health Network, SA, 5006, North Adelaide, Australia
| | - James Breen
- Computational & Systems Biology, Precision Medicine Theme, South Australian Health & Medical Research Institute, SA, 5000, Adelaide, Australia.
- Robinson Research Institute, University of Adelaide, SA, 5005, Adelaide, Australia.
- Adelaide Medical School, University of Adelaide, SA, 5005, Adelaide, Australia.
- South Australian Genomics Centre (SAGC), South Australian Health & Medical Research Institute (SAHMRI), SA, 5000, Adelaide, Australia.
| |
Collapse
|
15
|
Liu W, Yang Y, Abnousi A, Zhang Q, Kubo N, Beem JSM, Li Y, Hu M. MUNIn: A statistical framework for identifying long-range chromatin interactions from multiple samples. HGG ADVANCES 2021; 2. [PMID: 34485947 PMCID: PMC8415461 DOI: 10.1016/j.xhgg.2021.100036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Chromatin spatial organization (interactome) plays a critical role in genome function. Deep understanding of chromatin interactome can shed insights into transcriptional regulation mechanisms and human disease pathology. One essential task in the analysis of chromatin interactomic data is to identify long-range chromatin interactions. Existing approaches, such as HiCCUPS, FitHiC/FitHiC2, and FastHiC, are all designed for analyzing individual cell types or samples. None of them accounts for unbalanced sequencing depths and heterogeneity among multiple cell types or samples in a unified statistical framework. To fill in the gap, we have developed a novel statistical framework MUNIn (multiple-sample unifying long-range chromatin-interaction detector) for identifying long-range chromatin interactions from multiple samples. MUNIn adopts a hierarchical hidden Markov random field (H-HMRF) model, in which the status (peak or background) of each interacting chromatin loci pair depends not only on the status of loci pairs in its neighborhood region but also on the status of the same loci pair in other samples. To benchmark the performance of MUNIn, we performed comprehensive simulation studies and real data analysis and showed that MUNIn can achieve much lower false-positive rates for detecting sample-specific interactions (33.1%–36.2%), and much enhanced statistical power for detecting shared peaks (up to 74.3%), compared to uni-sample analysis. Our data demonstrated that MUNIn is a useful tool for the integrative analysis of interactomic data from multiple samples.
Collapse
Affiliation(s)
- Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.,These authors contributed equally
| | - Yuchen Yang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.,Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.,McAllister Heart Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.,These authors contributed equally
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44195, USA
| | - Qian Zhang
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| | - Naoki Kubo
- Department of Cellular and Case Molecular Medicine, University of California San Diego School of Medicine, La Jolla, CA, USA
| | - Joshua S Martin Beem
- Duke Human Vaccine Institute, Duke University School of Medicine, Durham, NC 27710, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.,Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44195, USA
| |
Collapse
|
16
|
Lagler TM, Abnousi A, Hu M, Yang Y, Li Y. HiC-ACT: improved detection of chromatin interactions from Hi-C data via aggregated Cauchy test. Am J Hum Genet 2021; 108:257-268. [PMID: 33545029 DOI: 10.1016/j.ajhg.2021.01.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 01/11/2021] [Indexed: 12/19/2022] Open
Abstract
Genome-wide chromatin conformation capture technologies such as Hi-C are commonly employed to study chromatin spatial organization. In particular, to identify statistically significant long-range chromatin interactions from Hi-C data, most existing methods such as Fit-Hi-C/FitHiC2 and HiCCUPS assume that all chromatin interactions are statistically independent. Such an independence assumption is reasonable at low resolution (e.g., 40 kb bin) but is invalid at high resolution (e.g., 5 or 10 kb bins) because spatial dependency of neighboring chromatin interactions is non-negligible at high resolution. Our previous hidden Markov random field-based methods accommodate spatial dependency but are computationally intensive. It is urgent to develop approaches that can model spatial dependence in a computationally efficient and scalable manner. Here, we develop HiC-ACT, an aggregated Cauchy test (ACT)-based approach, to improve the detection of chromatin interactions by post-processing results from methods assuming independence. To benchmark the performance of HiC-ACT, we re-analyzed deeply sequenced Hi-C data from a human lymphoblastoid cell line, GM12878, and mouse embryonic stem cells (mESCs). Our results demonstrate advantages of HiC-ACT in improving sensitivity with controlled type I error. By leveraging information from neighboring chromatin interactions, HiC-ACT enhances the power to detect interactions with lower signal-to-noise ratio and similar (if not stronger) epigenetic signatures that suggest regulatory roles. We further demonstrate that HiC-ACT peaks show higher overlap with known enhancers than Fit-Hi-C/FitHiC2 peaks in both GM12878 and mESCs. HiC-ACT, effectively a summary statistics-based approach, is computationally efficient (∼6 min and ∼2 GB memory to process 25,000 pairwise interactions).
Collapse
|
17
|
Crowley C, Yang Y, Qiu Y, Hu B, Abnousi A, Lipiński J, Plewczyński D, Wu D, Won H, Ren B, Hu M, Li Y. FIREcaller: Detecting frequently interacting regions from Hi-C data. Comput Struct Biotechnol J 2020; 19:355-362. [PMID: 33489005 PMCID: PMC7788093 DOI: 10.1016/j.csbj.2020.12.026] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 12/16/2020] [Accepted: 12/20/2020] [Indexed: 01/02/2023] Open
Abstract
Hi-C experiments have been widely adopted to study chromatin spatial organization, which plays an essential role in genome function. We have recently identified frequently interacting regions (FIREs) and found that they are closely associated with cell-type-specific gene regulation. However, computational tools for detecting FIREs from Hi-C data are still lacking. In this work, we present FIREcaller, a stand-alone, user-friendly R package for detecting FIREs from Hi-C data. FIREcaller takes raw Hi-C contact matrices as input, performs within-sample and cross-sample normalization, and outputs continuous FIRE scores, dichotomous FIREs, and super-FIREs. Applying FIREcaller to Hi-C data from various human tissues, we demonstrate that FIREs and super-FIREs identified, in a tissue-specific manner, are closely related to gene regulation, are enriched for enhancer-promoter (E-P) interactions, tend to overlap with regions exhibiting epigenomic signatures of cis-regulatory roles, and aid the interpretation or GWAS variants. The FIREcaller package is implemented in R and freely available at https://yunliweb.its.unc.edu/FIREcaller.
Collapse
Affiliation(s)
- Cheynna Crowley
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Yuchen Yang
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Benxia Hu
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | | | - Dariusz Plewczyński
- Cellular Genomics, Warsaw, Poland
- Department of Mathematics and Information Science, Warsaw University of Technology, Warszawa, Poland
| | - Di Wu
- Department of Biostatistics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
- UNC Neuroscience Center, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute of Genomic Medicine and Moores Cancer Center, University of California San Diego, La Jolla, CA, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Yun Li
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
18
|
Li X, An Z, Zhang Z. Comparison of computational methods for 3D genome analysis at single-cell Hi-C level. Methods 2019; 181-182:52-61. [PMID: 31445093 DOI: 10.1016/j.ymeth.2019.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/09/2019] [Accepted: 08/19/2019] [Indexed: 11/18/2022] Open
Abstract
Hi-C is a high-throughput chromosome conformation capture technology that is becoming routine in the literature. Although the price of sequencing has been dropping dramatically, high-resolution Hi-C data are not always an option for many studies, such as in single cells. However, the performance of current computational methods based on Hi-C at the ultra-sparse data condition has yet to be fully assessed. Therefore, in this paper, after briefly surveying the primary computational methods for Hi-C data analysis, we assess the performance of representative methods on data normalization, identification of compartments, Topologically Associating Domains (TADs) and chromatin loops under the condition of ultra-low resolution. We showed that most state-of-the-art methods do not work properly for that condition. Then, we applied the three best-performing methods on real single-cell Hi-C data, and their performance indicates that compartments may be a statistical feature emerging from the cell population, while TADs and chromatin loops may dynamically exist in single cells.
Collapse
Affiliation(s)
- Xiao Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Ziyang An
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
19
|
Non-coding variability at the APOE locus contributes to the Alzheimer's risk. Nat Commun 2019; 10:3310. [PMID: 31346172 PMCID: PMC6658518 DOI: 10.1038/s41467-019-10945-z] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 06/10/2019] [Indexed: 12/30/2022] Open
Abstract
Alzheimer’s disease (AD) is a leading cause of mortality in the elderly. While the coding change of APOE-ε4 is a key risk factor for late-onset AD and has been believed to be the only risk factor in the APOE locus, it does not fully explain the risk effect conferred by the locus. Here, we report the identification of AD causal variants in PVRL2 and APOC1 regions in proximity to APOE and define common risk haplotypes independent of APOE-ε4 coding change. These risk haplotypes are associated with changes of AD-related endophenotypes including cognitive performance, and altered expression of APOE and its nearby genes in the human brain and blood. High-throughput genome-wide chromosome conformation capture analysis further supports the roles of these risk haplotypes in modulating chromatin states and gene expression in the brain. Our findings provide compelling evidence for additional risk factors in the APOE locus that contribute to AD pathogenesis. Several studies show that APOE-ε4 coding variants are associated with Alzheimer’s disease (AD) risk. Here, Zhou et al. perform fine-mapping of the APOE region and find AD risk haplotypes with non-coding variants in the PVRL2 and APOC1 regions that are associated with relevant endophenotypes.
Collapse
|
20
|
Spill YG, Castillo D, Vidal E, Marti-Renom MA. Binless normalization of Hi-C data provides significant interaction and difference detection independent of resolution. Nat Commun 2019; 10:1938. [PMID: 31028255 PMCID: PMC6486590 DOI: 10.1038/s41467-019-09907-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2017] [Accepted: 03/21/2019] [Indexed: 02/05/2023] Open
Abstract
Chromosome conformation capture techniques, such as Hi-C, are fundamental in characterizing genome organization. These methods have revealed several genomic features, such as chromatin loops, whose disruption can have dramatic effects in gene regulation. Unfortunately, their detection is difficult; current methods require that the users choose the resolution of interaction maps based on dataset quality and sequencing depth. Here, we introduce Binless, a resolution-agnostic method that adapts to the quality and quantity of available data, to detect both interactions and differences. Binless relies on an alternate representation of Hi-C data, which leads to a more detailed classification of paired-end reads. Using a large-scale benchmark, we demonstrate that Binless is able to call interactions with higher reproducibility than other existing methods. Binless, which is freely available, can thus reliably be used to identify chromatin loops as well as for differential analysis of chromatin interaction maps.
Collapse
Affiliation(s)
- Yannick G Spill
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain.
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain.
| | - David Castillo
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Enrique Vidal
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Marc A Marti-Renom
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain.
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), 08002, Barcelona, Spain.
- ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
21
|
Tang B, Li F, Li J, Zhao W, Zhang Z. Delta: a new web-based 3D genome visualization and analysis platform. Bioinformatics 2019; 34:1409-1410. [PMID: 29253110 DOI: 10.1093/bioinformatics/btx805] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 12/14/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Availability and implementation Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. Contact zhangzhihua@big.ac.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bixia Tang
- CAS Key Laboratory of Genome Sciences and Information, Chinese Academy of Sciences, Beijing 101300, China.,BIG Data Center (BIGD), Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Feifei Li
- CAS Key Laboratory of Genome Sciences and Information, Chinese Academy of Sciences, Beijing 101300, China
| | - Jing Li
- CAS Key Laboratory of Genome Sciences and Information, Chinese Academy of Sciences, Beijing 101300, China
| | - Wenming Zhao
- BIG Data Center (BIGD), Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Chinese Academy of Sciences, Beijing 101300, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
22
|
Abstract
In the epigenetics field, large-scale functional genomics datasets of ever-increasing size and complexity have been produced using experimental techniques based on high-throughput sequencing. In particular, the study of the 3D organization of chromatin has raised increasing interest, thanks to the development of advanced experimental techniques. In this context, Hi-C has been widely adopted as a high-throughput method to measure pairwise contacts between virtually any pair of genomic loci, thus yielding unprecedented challenges for analyzing and handling the resulting complex datasets. In this review, we focus on the increasing complexity of available Hi-C datasets, which parallels the adoption of novel protocol variants. We also review the complexity of the multiple data analysis steps required to preprocess Hi-C sequencing reads and extract biologically meaningful information. Finally, we discuss solutions for handling and visualizing such large genomics datasets.
Collapse
|
23
|
Ron G, Globerson Y, Moran D, Kaplan T. Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains. Nat Commun 2017; 8:2237. [PMID: 29269730 PMCID: PMC5740158 DOI: 10.1038/s41467-017-02386-3] [Citation(s) in RCA: 109] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 11/24/2017] [Indexed: 01/06/2023] Open
Abstract
Proximity-ligation methods such as Hi-C allow us to map physical DNA–DNA interactions along the genome, and reveal its organization into topologically associating domains (TADs). As the Hi-C data accumulate, computational methods were developed for identifying domain borders in multiple cell types and organisms. Here, we present PSYCHIC, a computational approach for analyzing Hi-C data and identifying promoter–enhancer interactions. We use a unified probabilistic model to segment the genome into domains, which we then merge hierarchically and fit using a local background model, allowing us to identify over-represented DNA–DNA interactions across the genome. By analyzing the published Hi-C data sets in human and mouse, we identify hundreds of thousands of putative enhancers and their target genes, and compile an extensive genome-wide catalog of gene regulation in human and mouse. As we show, our predictions are highly enriched for ChIP-seq and DNA accessibility data, evolutionary conservation, eQTLs and other DNA–DNA interaction data. Proximity-ligation methods like Hi-C map DNA-DNA interactions and reveal its organization into topologically associating domains (TADs). Here the authors describe PSYCHIC, a computational approach for analysing Hi-C data that allows the identification of promoter-enhancer interactions.
Collapse
Affiliation(s)
- Gil Ron
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Yuval Globerson
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Dror Moran
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel.
| |
Collapse
|
24
|
Sauerwald N, Zhang S, Kingsford C, Bahar I. Chromosomal dynamics predicted by an elastic network model explains genome-wide accessibility and long-range couplings. Nucleic Acids Res 2017; 45:3663-3673. [PMID: 28334818 PMCID: PMC5397156 DOI: 10.1093/nar/gkx172] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 01/16/2017] [Accepted: 03/06/2017] [Indexed: 12/11/2022] Open
Abstract
Understanding the three-dimensional (3D) architecture of chromatin and its relation to gene expression and regulation is fundamental to understanding how the genome functions. Advances in Hi-C technology now permit us to study 3D genome organization, but we still lack an understanding of the structural dynamics of chromosomes. The dynamic couplings between regions separated by large genomic distances (>50 Mb) have yet to be characterized. We adapted a well-established protein-modeling framework, the Gaussian Network Model (GNM), to model chromatin dynamics using Hi-C data. We show that the GNM can identify spatial couplings at multiple scales: it can quantify the correlated fluctuations in the positions of gene loci, find large genomic compartments and smaller topologically-associating domains (TADs) that undergo en bloc movements, and identify dynamically coupled distal regions along the chromosomes. We show that the predictions of the GNM correlate well with genome-wide experimental measurements. We use the GNM to identify novel cross-correlated distal domains (CCDDs) representing pairs of regions distinguished by their long-range dynamic coupling and show that CCDDs are associated with increased gene co-expression. Together, these results show that GNM provides a mathematically well-founded unified framework for modeling chromatin dynamics and assessing the structural basis of genome-wide observations.
Collapse
Affiliation(s)
- Natalie Sauerwald
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - She Zhang
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
25
|
Schmitt AD, Hu M, Jung I, Xu Z, Qiu Y, Tan CL, Li Y, Lin S, Lin Y, Barr CL, Ren B. A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. Cell Rep 2016; 17:2042-2059. [PMID: 27851967 PMCID: PMC5478386 DOI: 10.1016/j.celrep.2016.10.061] [Citation(s) in RCA: 583] [Impact Index Per Article: 64.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 09/02/2016] [Accepted: 10/18/2016] [Indexed: 01/19/2023] Open
Abstract
The three-dimensional configuration of DNA is integral to all nuclear processes in eukaryotes, yet our knowledge of the chromosome architecture is still limited. Genome-wide chromosome conformation capture studies have uncovered features of chromatin organization in cultured cells, but genome architecture in human tissues has yet to be explored. Here, we report the most comprehensive survey to date of chromatin organization in human tissues. Through integrative analysis of chromatin contact maps in 21 primary human tissues and cell types, we find topologically associating domains highly conserved in different tissues. We also discover genomic regions that exhibit unusually high levels of local chromatin interactions. These frequently interacting regions (FIREs) are enriched for super-enhancers and are near tissue-specifically expressed genes. They display strong tissue-specificity in local chromatin interactions. Additionally, FIRE formation is partially dependent on CTCF and the Cohesin complex. We further show that FIREs can help annotate the function of non-coding sequence variants.
Collapse
Affiliation(s)
- Anthony D Schmitt
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA; UCSD Biomedical Sciences Graduate Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Ming Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, 650 First Avenue, New York, NY 10016, USA.
| | - Inkyung Jung
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA
| | - Zheng Xu
- Departments of Genetics, Biostatistics, and Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA; Quantitative Life Sciences Initiative, University of Nebraska, Lincoln, NE 68583, USA; Department of Statistics, University of Nebraska, Lincoln, NE 68583, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA; USCD Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Catherine L Tan
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA
| | - Yun Li
- Departments of Genetics, Biostatistics, and Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Shin Lin
- Division of Cardiology, Department of Medicine, University of Washington, 850 Republican Street, Seattle, WA 98108, USA
| | - Yiing Lin
- Department of Surgery, Washington University School of Medicine, 660 S Euclid Ave., Campus Box 8109, St. Louis, MO 63110, USA
| | - Cathy L Barr
- Krembil Research Institute University Health Network, The Hospital for Sick Children, The University of Toronto, Krembil Discovery Tower, 60 Leonard Ave. 8KD-412, Toronto, ON M5T 2S8, Canada
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA; Department of Cellular and Molecular Medicine, Moores Cancer Center and Institute of Genome Medicine, UCSD School of Medicine, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| |
Collapse
|