1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ArXiv 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
2
|
Varshney A, Manickam N, Orchard P, Tovar A, Zhang Z, Feng F, Erdos MR, Narisu N, Ventresca C, Nishino K, Rai V, Stringham HM, Jackson AU, Tamsen T, Gao C, Yang M, Koues OI, Welch JD, Burant CF, Williams LK, Jenkinson C, DeFronzo RA, Norton L, Saramies J, Lakka TA, Laakso M, Tuomilehto J, Mohlke KL, Kitzman JO, Koistinen HA, Liu J, Boehnke M, Collins FS, Scott LJ, Parker SCJ. Population-scale skeletal muscle single-nucleus multi-omic profiling reveals extensive context specific genetic regulation. bioRxiv 2023:2023.12.15.571696. [PMID: 38168419 PMCID: PMC10760134 DOI: 10.1101/2023.12.15.571696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Skeletal muscle, the largest human organ by weight, is relevant to several polygenic metabolic traits and diseases including type 2 diabetes (T2D). Identifying genetic mechanisms underlying these traits requires pinpointing the relevant cell types, regulatory elements, target genes, and causal variants. Here, we used genetic multiplexing to generate population-scale single nucleus (sn) chromatin accessibility (snATAC-seq) and transcriptome (snRNA-seq) maps across 287 frozen human skeletal muscle biopsies representing 456,880 nuclei. We identified 13 cell types that collectively represented 983,155 ATAC summits. We integrated genetic variation to discover 6,866 expression quantitative trait loci (eQTL) and 100,928 chromatin accessibility QTL (caQTL) (5% FDR) across the five most abundant cell types, cataloging caQTL peaks that atlas-level snATAC maps often miss. We identified 1,973 eGenes colocalized with caQTL and used mediation analyses to construct causal directional maps for chromatin accessibility and gene expression. 3,378 genome-wide association study (GWAS) signals across 43 relevant traits colocalized with sn-e/caQTL, 52% in a cell-specific manner. 77% of GWAS signals colocalized with caQTL and not eQTL, highlighting the critical importance of population-scale chromatin profiling for GWAS functional studies. GWAS-caQTL colocalization showed distinct cell-specific regulatory paradigms. For example, a C2CD4A/B T2D GWAS signal colocalized with caQTL in muscle fibers and multiple chromatin loop models nominated VPS13C, a glucose uptake gene. Sequence of the caQTL peak overlapping caSNP rs7163757 showed allelic regulatory activity differences in a human myocyte cell line massively parallel reporter assay. These results illuminate the genetic regulatory architecture of human skeletal muscle at high-resolution epigenomic, transcriptomic, and cell state scales and serve as a template for population-scale multi-omic mapping in complex tissues and traits.
Collapse
Affiliation(s)
- Arushi Varshney
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Nandini Manickam
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Peter Orchard
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Adelaide Tovar
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Zhenhao Zhang
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fan Feng
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Michael R Erdos
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Narisu Narisu
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Christa Ventresca
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Dept. of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Kirsten Nishino
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vivek Rai
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Heather M Stringham
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Anne U Jackson
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Tricia Tamsen
- Biomedical Research Core Facilities Advanced Genomics Core, University of Michigan, Ann Arbor, MI, USA
| | - Chao Gao
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Mao Yang
- Department of Internal Medicine, Center for Individualized and Genomic Medicine Research, Henry Ford Hospital, Detroit, MI, USA
| | - Olivia I Koues
- Biomedical Research Core Facilities Advanced Genomics Core, University of Michigan, Ann Arbor, MI, USA
| | - Joshua D Welch
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Charles F Burant
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - L Keoki Williams
- Department of Internal Medicine, Center for Individualized and Genomic Medicine Research, Henry Ford Hospital, Detroit, MI, USA
| | - Chris Jenkinson
- South Texas Diabetes and Obesity Research Institute, School of Medicine, University of Texas, Rio Grande Valley, TX, USA
| | - Ralph A DeFronzo
- Department of Medicine/Diabetes Division, University of Texas Health, San Antonio, TX, USA
| | - Luke Norton
- Department of Medicine/Diabetes Division, University of Texas Health, San Antonio, TX, USA
| | - Jouko Saramies
- Savitaipale Health Center, South Karelia Central Hospital, Lappeenranta, Finland
| | - Timo A Lakka
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Markku Laakso
- Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland
| | - Jaakko Tuomilehto
- Dept. of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Dept. of Public Health, University of Helsinki, Helsinki, Finland
- Diabetes Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Karen L Mohlke
- Dept. of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Jacob O Kitzman
- Dept. of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Heikki A Koistinen
- Dept. of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Department of Medicine, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Jie Liu
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Michael Boehnke
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Francis S Collins
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Laura J Scott
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Stephen C J Parker
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Dept. of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
3
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble W, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. bioRxiv 2023:2023.07.27.550836. [PMID: 37546906 PMCID: PMC10402156 DOI: 10.1101/2023.07.27.550836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
Collapse
Affiliation(s)
- Vianne R. Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R. McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A. Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A. Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R. Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Sanford I Weill department of Medicine, Sandra and Edward Meyer Cancer center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G. Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D. Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y. Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M. Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S. Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|