1
|
Liu R, Zhang Z, Won H, Marron JS. Significance in scale space for Hi-C data. Bioinformatics 2025; 41:btaf026. [PMID: 40036585 PMCID: PMC11879645 DOI: 10.1093/bioinformatics/btaf026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 01/02/2025] [Accepted: 02/25/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Hi-C technology has been developed to profile genome-wide chromosome conformation. So far Hi-C data have been generated from a large compendium of different cell types and different tissue types. Among different chromatin conformation units, chromatin loops were found to play a key role in gene regulation across different cell types. While many different loop calling algorithms have been developed, most loop callers identified shared loops as opposed to cell-type-specific loops. RESULTS We propose SSSHiC, a new loop calling algorithm based on significance in scale space, which can be used to understand data at different levels of resolution. By applying SSSHiC to neuronal and glial Hi-C data, we detected more loops that are potentially engaged in cell-type-specific gene regulation. Compared with other loop callers, such as Mustache, these loops were more frequently anchored to gene promoters of cellular marker genes and had better APA scores. Therefore, our results suggest that SSSHiC can effectively capture loops that contain more gene regulatory information. AVAILABILITY AND IMPLEMENTATION The Hi-C data used in this study can be accessed through the PsychENCODE Knowledge Portal at https://www.synapse.org/#! Synapse: syn21760712. The code utilized for Curvature SSS cited in this study is available at https://github.com/jsmarron/MarronMatlabSoftware/blob/master/Matlab9/Matlab9Combined.zip. All custom code used in this research can be found in the GitHub repository: https://github.com/jerryliu01998/HiC. The code has also been submitted to Code Ocean with the doi: 10.24433/CO.1912913.v1.
Collapse
Affiliation(s)
- Rui Liu
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Zhengwu Zhang
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - J S Marron
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| |
Collapse
|
2
|
Li C, Bonder MJ, Syed S, Jensen M, Gerstein MB, Zody MC, Chaisson MJP, Talkowski ME, Marschall T, Korbel JO, Eichler EE, Lee C, Shi X. An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes. Genome Res 2024; 34:2304-2318. [PMID: 39638559 PMCID: PMC11694747 DOI: 10.1101/gr.279419.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 10/04/2024] [Indexed: 12/07/2024]
Abstract
The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements' aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD-SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD-SVs intersect with cCREs and observe significant enrichment of TAD-SVs within cCREs. This study provides a database of TADs and TAD-SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.
Collapse
Affiliation(s)
- Chong Li
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, Pennsylvania 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA
| | - Marc Jan Bonder
- Department of Genetics, Groningen, University of Groningen, University Medical Center Groningen, Groningen 9713 AV, Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, 69120 Heidelberg, Germany
| | - Sabriya Syed
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Matthew Jensen
- Department of Molecular Biochemistry and Biophysics, Yale University, New Haven, Connecticut 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Mark B Gerstein
- Department of Molecular Biochemistry and Biophysics, Yale University, New Haven, Connecticut 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | | | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, Connecticut 06030-6403, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, Pennsylvania 19122, USA;
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
3
|
Hansen P, Blau H, Hecht J, Karlebach G, Krannich A, Steinhaus R, Truss M, Robinson PN. Using paired-end read orientations to assess technical biases in capture Hi-C. NAR Genom Bioinform 2024; 6:lqae156. [PMID: 39660253 PMCID: PMC11630073 DOI: 10.1093/nargab/lqae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 10/17/2024] [Accepted: 10/30/2024] [Indexed: 12/12/2024] Open
Abstract
Hi-C and capture Hi-C (CHi-C) both leverage paired-end sequencing of chimeric fragments to gauge the strength of interactions based on the total number of paired-end reads mapped to a common pair of restriction fragments. Mapped paired-end reads can have four relative orientations, depending on the genomic positions and strands of the two reads. We assigned one paired-end read orientation to each of the four possible re-ligations that can occur between two given restriction fragments. In a large hematopoietic cell dataset, we determined the read pair counts of interactions separately for each orientation. Interactions with imbalances in the counts occur much more often than expected by chance for both Hi-C and CHi-C. Based on such imbalances, we identified target restriction fragments enriched at only one instead of both ends. By matching them to the baits used for the experiments, we confirmed our assignment of paired-end read orientations and gained insights that can inform bait design. An analysis of unbaited fragments shows that, beyond bait effects, other known types of technical biases are reflected in count imbalances. Taking advantage of distance-dependent contact frequencies, we assessed the impact of such biases. Our results have the potential to improve the design and interpretation of CHi-C experiments.
Collapse
Affiliation(s)
- Peter Hansen
- The Robinson Lab, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, 06032, Connecticut, USA
- The Robinson Lab, Berlin Institute of Health, Luisenstr. 65, 10117, Berlin, Germany
| | - Hannah Blau
- The Robinson Lab, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, 06032, Connecticut, USA
| | - Jochen Hecht
- Genomics Unit, Centre for Genomic Regulation, Carrer del Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Guy Karlebach
- The Robinson Lab, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, 06032, Connecticut, USA
| | - Alexander Krannich
- Experimental and Clinical Research Center, Charité Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany
| | - Robin Steinhaus
- Exploratory Diagnostic Sciences, Berlin Institute of Health, Charitéplatz 1, 10117, Berlin, Germany
- Institute of Medical Genetics and Human Genetics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Matthias Truss
- Labor für Pädiatrische Molekularbiologie, Charité Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Peter N Robinson
- The Robinson Lab, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, 06032, Connecticut, USA
- The Robinson Lab, Berlin Institute of Health, Luisenstr. 65, 10117, Berlin, Germany
| |
Collapse
|
4
|
Kumar Halder A, Agarwal A, Jodkowska K, Plewczynski D. A systematic analyses of different bioinformatics pipelines for genomic data and its impact on deep learning models for chromatin loop prediction. Brief Funct Genomics 2024; 23:538-548. [PMID: 38555493 DOI: 10.1093/bfgp/elae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/07/2024] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Genomic data analysis has witnessed a surge in complexity and volume, primarily driven by the advent of high-throughput technologies. In particular, studying chromatin loops and structures has become pivotal in understanding gene regulation and genome organization. This systematic investigation explores the realm of specialized bioinformatics pipelines designed specifically for the analysis of chromatin loops and structures. Our investigation incorporates two protein (CTCF and Cohesin) factor-specific loop interaction datasets from six distinct pipelines, amassing a comprehensive collection of 36 diverse datasets. Through a meticulous review of existing literature, we offer a holistic perspective on the methodologies, tools and algorithms underpinning the analysis of this multifaceted genomic feature. We illuminate the vast array of approaches deployed, encompassing pivotal aspects such as data preparation pipeline, preprocessing, statistical features and modelling techniques. Beyond this, we rigorously assess the strengths and limitations inherent in these bioinformatics pipelines, shedding light on the interplay between data quality and the performance of deep learning models, ultimately advancing our comprehension of genomic intricacies.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Abhishek Agarwal
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Karolina Jodkowska
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
5
|
Wang Y, Cheng J. HiCDiff: single-cell Hi-C data denoising with diffusion models. Brief Bioinform 2024; 25:bbae279. [PMID: 38856167 PMCID: PMC11163381 DOI: 10.1093/bib/bbae279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/21/2024] [Accepted: 05/29/2024] [Indexed: 06/11/2024] Open
Abstract
The genome-wide single-cell chromosome conformation capture technique, i.e. single-cell Hi-C (ScHi-C), was recently developed to interrogate the conformation of the genome of individual cells. However, single-cell Hi-C data are much sparser than bulk Hi-C data of a population of cells, and noise in single-cell Hi-C makes it difficult to apply and analyze them in biological research. Here, we developed the first generative diffusion models (HiCDiff) to denoise single-cell Hi-C data in the form of chromosomal contact matrices. HiCDiff uses a deep residual network to remove the noise in the reverse process of diffusion and can be trained in both unsupervised and supervised learning modes. Benchmarked on several single-cell Hi-C test datasets, the diffusion models substantially remove the noise in single-cell Hi-C data. The unsupervised HiCDiff outperforms most supervised non-diffusion deep learning methods and achieves the performance comparable to the state-of-the-art supervised deep learning method in terms of multiple metrics, demonstrating that diffusion models are a useful approach to denoising single-cell Hi-C data. Moreover, its good performance holds on denoising bulk Hi-C data.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
6
|
Camerino M, Chang W, Cvekl A. Analysis of long-range chromatin contacts, compartments and looping between mouse embryonic stem cells, lens epithelium and lens fibers. Epigenetics Chromatin 2024; 17:10. [PMID: 38643244 PMCID: PMC11031936 DOI: 10.1186/s13072-024-00533-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 03/08/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND Nuclear organization of interphase chromosomes involves individual chromosome territories, "open" and "closed" chromatin compartments, topologically associated domains (TADs) and chromatin loops. The DNA- and RNA-binding transcription factor CTCF together with the cohesin complex serve as major organizers of chromatin architecture. Cellular differentiation is driven by temporally and spatially coordinated gene expression that requires chromatin changes of individual loci of various complexities. Lens differentiation represents an advantageous system to probe transcriptional mechanisms underlying tissue-specific gene expression including high transcriptional outputs of individual crystallin genes until the mature lens fiber cells degrade their nuclei. RESULTS Chromatin organization between mouse embryonic stem (ES) cells, newborn (P0.5) lens epithelium and fiber cells were analyzed using Hi-C. Localization of CTCF in both lens chromatins was determined by ChIP-seq and compared with ES cells. Quantitative analyses show major differences between number and size of TADs and chromatin loop size between these three cell types. In depth analyses show similarities between lens samples exemplified by overlaps between compartments A and B. Lens epithelium-specific CTCF peaks are found in mostly methylated genomic regions while lens fiber-specific and shared peaks occur mostly within unmethylated DNA regions. Major differences in TADs and loops are illustrated at the ~ 500 kb Pax6 locus, encoding the critical lens regulatory transcription factor and within a larger ~ 15 Mb WAGR locus, containing Pax6 and other loci linked to human congenital diseases. Lens and ES cell Hi-C data (TADs and loops) together with ATAC-seq, CTCF, H3K27ac, H3K27me3 and ENCODE cis-regulatory sites are shown in detail for the Pax6, Sox1 and Hif1a loci, multiple crystallin genes and other important loci required for lens morphogenesis. The majority of crystallin loci are marked by unexpectedly high CTCF-binding across their transcribed regions. CONCLUSIONS Our study has generated the first data on 3-dimensional (3D) nuclear organization in lens epithelium and lens fibers and directly compared these data with ES cells. These findings generate novel insights into lens-specific transcriptional gene control, open new research avenues to study transcriptional condensates in lens fiber cells, and enable studies of non-coding genetic variants linked to cataract and other lens and ocular abnormalities.
Collapse
Affiliation(s)
- Michael Camerino
- The Departments Genetics, Albert Einstein College of Medicine, NY10461, Bronx, USA
| | - William Chang
- Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, NY10461, Bronx, USA
| | - Ales Cvekl
- The Departments Genetics, Albert Einstein College of Medicine, NY10461, Bronx, USA.
- Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, NY10461, Bronx, USA.
| |
Collapse
|
7
|
Yusuf N, Monahan K. Epigenetic programming of stochastic olfactory receptor choice. Genesis 2024; 62:e23593. [PMID: 38562011 PMCID: PMC11003729 DOI: 10.1002/dvg.23593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/01/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024]
Abstract
The mammalian sense of smell relies upon a vast array of receptor proteins to detect odorant compounds present in the environment. The proper deployment of these receptor proteins in olfactory sensory neurons is orchestrated by a suite of epigenetic processes that remodel the olfactory genes in differentiating neuronal progenitors. The goal of this review is to elucidate the central role of gene regulatory processes acting in neuronal progenitors of olfactory sensory neurons that lead to a singular expression of an odorant receptor in mature olfactory sensory neurons. We begin by describing the principal features of odorant receptor gene expression in mature olfactory sensory neurons. Next, we delineate our current understanding of how these features emerge from multiple gene regulatory mechanisms acting in neuronal progenitors. Finally, we close by discussing the key gaps in our understanding of how these regulatory mechanisms work and how they interact with each other over the course of differentiation.
Collapse
Affiliation(s)
- Nusrath Yusuf
- Division of Life Sciences-Molecular Biology and Biochemistry Department, Rutgers University-New Brunswick, New Brunswick, New Jersey, USA
| | - Kevin Monahan
- Division of Life Sciences-Molecular Biology and Biochemistry Department, Rutgers University-New Brunswick, New Brunswick, New Jersey, USA
| |
Collapse
|
8
|
Ing-Simmons E, Machnik N, Vaquerizas JM. Reply to: Revisiting the use of structural similarity index in Hi-C. Nat Genet 2023; 55:2053-2055. [PMID: 38052961 DOI: 10.1038/s41588-023-01595-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 10/25/2023] [Indexed: 12/07/2023]
Affiliation(s)
- Elizabeth Ing-Simmons
- MRC London Institute of Medical Sciences, London, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Nick Machnik
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Juan M Vaquerizas
- MRC London Institute of Medical Sciences, London, UK.
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
9
|
Bringloe TT, Parent GJ. Contrasting new and available reference genomes to highlight uncertainties in assemblies and areas for future improvement: an example with monodontid species. BMC Genomics 2023; 24:693. [PMID: 37985969 PMCID: PMC10659057 DOI: 10.1186/s12864-023-09779-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/31/2023] [Indexed: 11/22/2023] Open
Abstract
BACKGROUND Reference genomes provide a foundational framework for evolutionary investigations, ecological analysis, and conservation science, yet uncertainties in the assembly of reference genomes are difficult to assess, and by extension rarely quantified. Reference genomes for monodontid cetaceans span a wide spectrum of data types and analytical approaches, providing the context to derive broader insights related to discrepancies and regions of uncertainty in reference genome assembly. We generated three beluga (Delphinapterus leucas) and one narwhal (Monodon monoceros) reference genomes and contrasted these with published chromosomal scale assemblies for each species to quantify discrepancies associated with genome assemblies. RESULTS The new reference genomes achieved chromosomal scale assembly using a combination of PacBio long reads, Illumina short reads, and Hi-C scaffolding data. For beluga, we identified discrepancies in the order and orientation of contigs in 2.2-3.7% of the total genome depending on the pairwise comparison of references. In addition, unsupported higher order scaffolding was identified in published reference genomes. In contrast, we estimated 8.2% of the compared narwhal genomes featured discrepancies, with inversions being notably abundant (5.3%). Discrepancies were linked to repetitive elements in both species. CONCLUSIONS We provide several new reference genomes for beluga (Delphinapterus leucas), while highlighting potential avenues for improvements. In particular, additional layers of data providing information on ultra-long genomic distances are needed to resolve persistent errors in reference genome construction. The comparative analyses of monodontid reference genomes suggested that the three new reference genomes for beluga are more accurate compared to the currently published reference genome, but that the new narwhal genome is less accurate than one published. We also present a conceptual summary for improving the accuracy of reference genomes with relevance to end-user needs and how they relate to levels of assembly quality and uncertainty.
Collapse
Affiliation(s)
- Trevor T Bringloe
- Laboratory of Genomics, Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, QC, Canada.
| | - Geneviève J Parent
- Laboratory of Genomics, Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, QC, Canada.
| |
Collapse
|
10
|
Wang L, Li LL, Chen L, Zhang RG, Zhao SW, Yan H, Gao J, Chen X, Si YJ, Chen Z, Liu H, Xie XM, Zhao W, Han B, Qin X, Jia KH. Telomere-to-telomere and haplotype-resolved genome assembly of the Chinese cork oak ( Quercus variabilis). FRONTIERS IN PLANT SCIENCE 2023; 14:1290913. [PMID: 38023918 PMCID: PMC10652414 DOI: 10.3389/fpls.2023.1290913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023]
Abstract
The Quercus variabilis, a deciduous broadleaved tree species, holds significant ecological and economical value. While a chromosome-level genome for this species has been made available, it remains riddled with unanchored sequences and gaps. In this study, we present a nearly complete comprehensive telomere-to-telomere (T2T) and haplotype-resolved reference genome for Q. variabilis. This was achieved through the integration of ONT ultra-long reads, PacBio HiFi long reads, and Hi-C data. The resultant two haplotype genomes measure 789 Mb and 768 Mb in length, with a contig N50 of 65 Mb and 56 Mb, and were anchored to 12 allelic chromosomes. Within this T2T haplotype-resolved assembly, we predicted 36,830 and 36,370 protein-coding genes, with 95.9% and 96.0% functional annotation for each haplotype genome. The availability of the T2T and haplotype-resolved reference genome lays a solid foundation, not only for illustrating genome structure and functional genomics studies but also to inform and facilitate genetic breeding and improvement of cultivated Quercus species.
Collapse
Affiliation(s)
- Longxin Wang
- School of Biological Science and Technology, University of Jinan, Jinan, China
| | - Lei-Lei Li
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Li Chen
- Shandong Saienfu Stem Cell Engineering Group Co., Ltd, Jinan, China
| | - Ren-Gang Zhang
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Shi-Wei Zhao
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Han Yan
- The Second Affiliated Hospital of Shandong First Medical University, Taian, China
| | - Jie Gao
- Chinese Academy of Sciences (CAS), Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, China
| | - Xue Chen
- Weifang Academy of Agricultural Sciences, Weifang, China
| | - Yu-Jun Si
- Weifang Academy of Agricultural Sciences, Weifang, China
| | - Zhe Chen
- InvoGenomics Biotechnology Co., Ltd., Jinan, China
| | - Haibo Liu
- Jinan Academy of Landscape and Forestry Science, Jinan, China
| | - Xiao-Man Xie
- Key Laboratory of State Forestry and Grassland Administration Conservation and Utilization of Warm Temperate Zone Forest and Grass Germplasm Resources, Shandong Provincial Center of Forest and Grass Germplasm Resources, Jinan, China
| | - Wei Zhao
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Biao Han
- Key Laboratory of State Forestry and Grassland Administration Conservation and Utilization of Warm Temperate Zone Forest and Grass Germplasm Resources, Shandong Provincial Center of Forest and Grass Germplasm Resources, Jinan, China
| | - Xiaochun Qin
- School of Biological Science and Technology, University of Jinan, Jinan, China
| | - Kai-Hua Jia
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan, China
| |
Collapse
|
11
|
Raffo A, Paulsen J. The shape of chromatin: insights from computational recognition of geometric patterns in Hi-C data. Brief Bioinform 2023; 24:bbad302. [PMID: 37646128 PMCID: PMC10516369 DOI: 10.1093/bib/bbad302] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/05/2023] [Accepted: 08/03/2023] [Indexed: 09/01/2023] Open
Abstract
The three-dimensional organization of chromatin plays a crucial role in gene regulation and cellular processes like deoxyribonucleic acid (DNA) transcription, replication and repair. Hi-C and related techniques provide detailed views of spatial proximities within the nucleus. However, data analysis is challenging partially due to a lack of well-defined, underpinning mathematical frameworks. Recently, recognizing and analyzing geometric patterns in Hi-C data has emerged as a powerful approach. This review provides a summary of algorithms for automatic recognition and analysis of geometric patterns in Hi-C data and their correspondence with chromatin structure. We classify existing algorithms on the basis of the data representation and pattern recognition paradigm they make use of. Finally, we outline some of the challenges ahead and promising future directions.
Collapse
Affiliation(s)
- Andrea Raffo
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
12
|
Krieger KL, Mann EK, Lee KJ, Bolterstein E, Jebakumar D, Ittmann MM, Dal Zotto VL, Shaban M, Sreekumar A, Gassman NR. Spatial mapping of the DNA adducts in cancer. DNA Repair (Amst) 2023; 128:103529. [PMID: 37390674 PMCID: PMC10330576 DOI: 10.1016/j.dnarep.2023.103529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/02/2023]
Abstract
DNA adducts and strand breaks are induced by various exogenous and endogenous agents. Accumulation of DNA damage is implicated in many disease processes, including cancer, aging, and neurodegeneration. The continuous acquisition of DNA damage from exogenous and endogenous stressors coupled with defects in DNA repair pathways contribute to the accumulation of DNA damage within the genome and genomic instability. While mutational burden offers some insight into the level of DNA damage a cell may have experienced and subsequently repaired, it does not quantify DNA adducts and strand breaks. Mutational burden also infers the identity of the DNA damage. With advances in DNA adduct detection and quantification methods, there is an opportunity to identify DNA adducts driving mutagenesis and correlate with a known exposome. However, most DNA adduct detection methods require isolation or separation of the DNA and its adducts from the context of the nuclei. Mass spectrometry, comet assays, and other techniques precisely quantify lesion types but lose the nuclear context and even tissue context of the DNA damage. The growth in spatial analysis technologies offers a novel opportunity to leverage DNA damage detection with nuclear and tissue context. However, we lack a wealth of techniques capable of detecting DNA damage in situ. Here, we review the limited existing in situ DNA damage detection methods and examine their potential to offer spatial analysis of DNA adducts in tumors or other tissues. We also offer a perspective on the need for spatial analysis of DNA damage in situ and highlight Repair Assisted Damage Detection (RADD) as an in situ DNA adduct technique with the potential to integrate with spatial analysis and the challenges to be addressed.
Collapse
Affiliation(s)
- Kimiko L Krieger
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Center for Translational Metabolism and Health Disparities (C-TMH), Baylor College of Medicine, Houston, TX 77030, USA
| | - Elise K Mann
- Department of Physiology and Cell Biology, College of Medicine, University of South Alabama, Mobile, AL 36688, USA; Mitchell Cancer Institute, University of South Alabama, Mobile, AL 36604, USA
| | - Kevin J Lee
- Department of Physiology and Cell Biology, College of Medicine, University of South Alabama, Mobile, AL 36688, USA; Mitchell Cancer Institute, University of South Alabama, Mobile, AL 36604, USA
| | - Elyse Bolterstein
- Department of Biology, Northeastern Illinois University, Chicago, IL 60625, USA
| | - Deborah Jebakumar
- Department of Anatomic Pathology, Baylor Scott & White Medical Center, Temple, TX 76508, USA; Texas A&M College of Medicine, Temple, TX 76508, USA
| | - Michael M Ittmann
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA; Human Tissue Acquisition & Pathology Shared Resource, Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Valeria L Dal Zotto
- Department of Pathology, College of Medicine, University of South Alabama, Mobile, AL 36688, USA
| | - Mohamed Shaban
- Department of Electrical and Computer Engineering, University of South Alabama, Mobile, AL 36688, USA
| | - Arun Sreekumar
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Center for Translational Metabolism and Health Disparities (C-TMH), Baylor College of Medicine, Houston, TX 77030, USA; Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Natalie R Gassman
- Department of Pharmacology and Toxicology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| |
Collapse
|
13
|
Gallardo-Escárate C, Valenzuela-Muñoz V, Nuñez-Acuña G, Valenzuela-Miranda D, Tapia FJ, Yévenes M, Gajardo G, Toro JE, Oyarzún PA, Arriagada G, Novoa B, Figueras A, Roberts S, Gerdol M. Chromosome-Level Genome Assembly of the Blue Mussel Mytilus chilensis Reveals Molecular Signatures Facing the Marine Environment. Genes (Basel) 2023; 14:876. [PMID: 37107634 PMCID: PMC10137854 DOI: 10.3390/genes14040876] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 03/28/2023] [Accepted: 03/30/2023] [Indexed: 04/29/2023] Open
Abstract
The blue mussel Mytilus chilensis is an endemic and key socioeconomic species inhabiting the southern coast of Chile. This bivalve species supports a booming aquaculture industry, which entirely relies on artificially collected seeds from natural beds that are translocated to diverse physical-chemical ocean farming conditions. Furthermore, mussel production is threatened by a broad range of microorganisms, pollution, and environmental stressors that eventually impact its survival and growth. Herein, understanding the genomic basis of the local adaption is pivotal to developing sustainable shellfish aquaculture. We present a high-quality reference genome of M. chilensis, which is the first chromosome-level genome for a Mytilidae member in South America. The assembled genome size was 1.93 Gb, with a contig N50 of 134 Mb. Through Hi-C proximity ligation, 11,868 contigs were clustered, ordered, and assembled into 14 chromosomes in congruence with the karyological evidence. The M. chilensis genome comprises 34,530 genes and 4795 non-coding RNAs. A total of 57% of the genome contains repetitive sequences with predominancy of LTR-retrotransposons and unknown elements. Comparative genome analysis of M. chilensis and M. coruscus was conducted, revealing genic rearrangements distributed into the whole genome. Notably, transposable Steamer-like elements associated with horizontal transmissible cancer were explored in reference genomes, suggesting putative relationships at the chromosome level in Bivalvia. Genome expression analysis was also conducted, showing putative genomic differences between two ecologically different mussel populations. The evidence suggests that local genome adaptation and physiological plasticity can be analyzed to develop sustainable mussel production. The genome of M. chilensis provides pivotal molecular knowledge for the Mytilus complex.
Collapse
Affiliation(s)
| | | | - Gustavo Nuñez-Acuña
- Center for Aquaculture Research, University of Concepción, Concepción 4070386, Chile
| | | | - Fabian J. Tapia
- Center for Aquaculture Research, University of Concepción, Concepción 4070386, Chile
| | - Marco Yévenes
- Laboratorio de Genética, Acuicultura & Biodiversidad, Departamento de Ciencias Biológicas y Biodiversidad, Universidad de Los Lagos, Osorno 5310230, Chile
| | - Gonzalo Gajardo
- Laboratorio de Genética, Acuicultura & Biodiversidad, Departamento de Ciencias Biológicas y Biodiversidad, Universidad de Los Lagos, Osorno 5310230, Chile
| | - Jorge E. Toro
- Facultad de Ciencias, Instituto de Ciencias Marinas y Limnológicas (ICML), Universidad Austral de Chile, Valdivia 5110566, Chile
| | - Pablo A. Oyarzún
- Centro de Investigación Marina Quintay (CIMARQ), Universidad Andres Bello, Quintay 2340000, Chile
| | - Gloria Arriagada
- Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad Andrés Bello, Santiago 8370186, Chile
- FONDAP Center for Genome Regulation, Santiago 8370415, Chile
| | - Beatriz Novoa
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), 36208 Vigo, Spain
| | - Antonio Figueras
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), 36208 Vigo, Spain
| | - Steven Roberts
- School of Aquatic and Fishery Sciences (SAFS), University of Washington, Seattle, WA 98195, USA
| | - Marco Gerdol
- Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
| |
Collapse
|
14
|
Tao X, Li S, Chen G, Wang J, Xu S. Approaches for Modes of Action Study of Long Non-Coding RNAs: From Single Verification to Genome-Wide Determination. Int J Mol Sci 2023; 24:ijms24065562. [PMID: 36982636 PMCID: PMC10054671 DOI: 10.3390/ijms24065562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/08/2023] [Accepted: 03/10/2023] [Indexed: 03/17/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are transcripts longer than 200 nucleotides (nt) that are not translated into known functional proteins. This broad definition covers a large collection of transcripts with diverse genomic origins, biogenesis, and modes of action. Thus, it is very important to choose appropriate research methodologies when investigating lncRNAs with biological significance. Multiple reviews to date have summarized the mechanisms of lncRNA biogenesis, their localization, their functions in gene regulation at multiple levels, and also their potential applications. However, little has been reviewed on the leading strategies for lncRNA research. Here, we generalize a basic and systemic mind map for lncRNA research and discuss the mechanisms and the application scenarios of ‘up-to-date’ techniques as applied to molecular function studies of lncRNAs. Taking advantage of documented lncRNA research paradigms as examples, we aim to provide an overview of the developing techniques for elucidating lncRNA interactions with genomic DNA, proteins, and other RNAs. In the end, we propose the future direction and potential technological challenges of lncRNA studies, focusing on techniques and applications.
Collapse
Affiliation(s)
- Xiaoyuan Tao
- Xianghu Laboratory, Hangzhou 311231, China
- Central Laboratory, State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| | - Sujuan Li
- Central Laboratory, State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| | - Guang Chen
- Central Laboratory, State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| | - Jian Wang
- Central Laboratory, State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| | - Shengchun Xu
- Xianghu Laboratory, Hangzhou 311231, China
- Central Laboratory, State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-Products, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
- Correspondence:
| |
Collapse
|
15
|
Li Q, Perera D, Cao C, He J, Bian J, Chen X, Azeem F, Howe A, Au B, Wu J, Yan J, Long Q. Interaction-integrated linear mixed model reveals 3D-genetic basis underlying Autism. Genomics 2023; 115:110575. [PMID: 36758877 DOI: 10.1016/j.ygeno.2023.110575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 01/16/2023] [Accepted: 02/03/2023] [Indexed: 02/10/2023]
Abstract
Genetic interactions play critical roles in genotype-phenotype associations. We developed a novel interaction-integrated linear mixed model (ILMM) that integrates a priori knowledge into linear mixed models. ILMM enables statistical integration of genetic interactions upfront and overcomes the problems of searching for combinations. To demonstrate its utility, with 3D genomic interactions (assessed by Hi-C experiments) as a priori, we applied ILMM to whole-genome sequencing data for Autism Spectrum Disorders (ASD) and brain transcriptome data, revealing the 3D-genetic basis of ASD and 3D-expression quantitative loci (3D-eQTLs) for brain tissues. Notably, we reported a potential mechanism involving distal regulation between FOXP2 and DNMT3A, conferring the risk of ASD.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Deshan Perera
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Chen Cao
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Jiayi Bian
- Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada
| | - Xingyu Chen
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Feeha Azeem
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Aaron Howe
- Heritage Youth Researcher Summer Program, University of Calgary, Alberta T2N 1N4, Canada
| | - Billie Au
- Department of Medical Genetics, University of Calgary, Alberta T2N 1N4, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Alberta T2N 1N4, Canada
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada
| | - Jun Yan
- Department of Physiology and Pharmacology, University of Calgary, Alberta T2N 1N4, Canada; Hotchkiss Brain Institute, University of Calgary, Alberta T2N 1N4, Canada.
| | - Quan Long
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada; Department of Medical Genetics, University of Calgary, Alberta T2N 1N4, Canada; Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Alberta T2N 1N4, Canada; Hotchkiss Brain Institute, University of Calgary, Alberta T2N 1N4, Canada.
| |
Collapse
|
16
|
Silveira PP, Meaney MJ. Examining the biological mechanisms of human mental disorders resulting from gene-environment interdependence using novel functional genomic approaches. Neurobiol Dis 2023; 178:106008. [PMID: 36690304 DOI: 10.1016/j.nbd.2023.106008] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/30/2022] [Accepted: 01/18/2023] [Indexed: 01/21/2023] Open
Abstract
We explore how functional genomics approaches that integrate datasets from human and non-human model systems can improve our understanding of the effect of gene-environment interplay on the risk for mental disorders. We start by briefly defining the G-E paradigm and its challenges and then discuss the different levels of regulation of gene expression and the corresponding data existing in humans (genome wide genotyping, transcriptomics, DNA methylation, chromatin modifications, chromosome conformational changes, non-coding RNAs, proteomics and metabolomics), discussing novel approaches to the application of these data in the study of the origins of mental health. Finally, we discuss the multilevel integration of diverse types of data. Advance in the use of functional genomics in the context of a G-E perspective improves the detection of vulnerabilities, informing the development of preventive and therapeutic interventions.
Collapse
Affiliation(s)
- Patrícia Pelufo Silveira
- Department of Psychiatry, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada; Ludmer Centre for Neuroinformatics and Mental Health, Douglas Mental Health University Institute, McGill University, Montreal, QC, Canada.
| | - Michael J Meaney
- Department of Psychiatry, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada; Translational Neuroscience Program, Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (ASTAR), Singapore; Brain - Body Initiative, Agency for Science, Technology and Research (ASTAR), Singapore.
| |
Collapse
|
17
|
Anderson BD, Bisanz JE. Challenges and opportunities of strain diversity in gut microbiome research. Front Microbiol 2023; 14:1117122. [PMID: 36876113 PMCID: PMC9981649 DOI: 10.3389/fmicb.2023.1117122] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/24/2023] [Indexed: 02/19/2023] Open
Abstract
Just because two things are related does not mean they are the same. In analyzing microbiome data, we are often limited to species-level analyses, and even with the ability to resolve strains, we lack comprehensive databases and understanding of the importance of strain-level variation outside of a limited number of model organisms. The bacterial genome is highly plastic with gene gain and loss occurring at rates comparable or higher than de novo mutations. As such, the conserved portion of the genome is often a fraction of the pangenome which gives rise to significant phenotypic variation, particularly in traits which are important in host microbe interactions. In this review, we discuss the mechanisms that give rise to strain variation and methods that can be used to study it. We identify that while strain diversity can act as a major barrier in interpreting and generalizing microbiome data, it can also be a powerful tool for mechanistic research. We then highlight recent examples demonstrating the importance of strain variation in colonization, virulence, and xenobiotic metabolism. Moving past taxonomy and the species concept will be crucial for future mechanistic research to understand microbiome structure and function.
Collapse
Affiliation(s)
- Benjamin D. Anderson
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States
| | - Jordan E. Bisanz
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States
- The Penn State Microbiome Center, Huck Institutes of the Life Sciences, University Park, PA, United States
| |
Collapse
|
18
|
Iqbal W, Zhou W. Computational Methods for Single-cell DNA Methylome Analysis. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:48-66. [PMID: 35718270 PMCID: PMC10372927 DOI: 10.1016/j.gpb.2022.05.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 04/28/2022] [Accepted: 05/10/2022] [Indexed: 11/19/2022]
Abstract
Dissecting intercellular epigenetic differences is key to understanding tissue heterogeneity. Recent advances in single-cell DNA methylome profiling have presented opportunities to resolve this heterogeneity at the maximum resolution. While these advances enable us to explore frontiers of chromatin biology and better understand cell lineage relationships, they pose new challenges in data processing and interpretation. This review surveys the current state of computational tools developed for single-cell DNA methylome data analysis. We discuss critical components of single-cell DNA methylome data analysis, including data preprocessing, quality control, imputation, dimensionality reduction, cell clustering, supervised cell annotation, cell lineage reconstruction, gene activity scoring, and integration with transcriptome data. We also highlight unique aspects of single-cell DNA methylome data analysis and discuss how techniques common to other single-cell omics data analyses can be adapted to analyze DNA methylomes. Finally, we discuss existing challenges and opportunities for future development.
Collapse
Affiliation(s)
- Waleed Iqbal
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Wanding Zhou
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
19
|
Yang JY, Chang JM. Pattern recognition of topologically associating domains using deep learning. BMC Bioinformatics 2022; 22:634. [PMID: 36482308 PMCID: PMC9732975 DOI: 10.1186/s12859-022-05075-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Recent increasing evidence indicates that three-dimensional chromosome structure plays an important role in genomic function. Topologically associating domains (TADs) are self-interacting regions that have been shown to be a chromosomal structural unit. During evolution, these are conserved based on checking synteny block cross species. Are there common TAD patterns across species or cell lines? RESULTS To address the above question, we propose a novel task-TAD recognition-as opposed to traditional TAD identification. Specifically, we treat Hi-C maps as images, thus re-casting TAD recognition as image pattern recognition, for which we use a convolutional neural network and a residual neural network. In addition, we propose an elegant way to generate non-TAD data for binary classification. We demonstrate deep learning performance which is quite promising, AUC > 0.80, through cross-species and cell-type validation. CONCLUSIONS TADs have been shown to be conserved during evolution. Interestingly, our results confirm that the TAD recognition model is practical across species, which indicates that TADs between human and mouse show common patterns from an image classification point of view. Our approach could be a new way to identify TAD variations or patterns among Hi-C maps. For example, TADs of two Hi-C maps are conserved if the two classification models are exchangeable.
Collapse
Affiliation(s)
- Jhen Yuan Yang
- Department of Computer Science, National Chengchi University, 11605 Taipei City, Taiwan
| | - Jia-Ming Chang
- Department of Computer Science, National Chengchi University, 11605 Taipei City, Taiwan
| |
Collapse
|
20
|
Orozco G, Schoenfelder S, Walker N, Eyre S, Fraser P. 3D genome organization links non-coding disease-associated variants to genes. Front Cell Dev Biol 2022; 10:995388. [PMID: 36340032 PMCID: PMC9631826 DOI: 10.3389/fcell.2022.995388] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/27/2022] [Indexed: 11/13/2022] Open
Abstract
Genome sequencing has revealed over 300 million genetic variations in human populations. Over 90% of variants are single nucleotide polymorphisms (SNPs), the remainder include short deletions or insertions, and small numbers of structural variants. Hundreds of thousands of these variants have been associated with specific phenotypic traits and diseases through genome wide association studies which link significant differences in variant frequencies with specific phenotypes among large groups of individuals. Only 5% of disease-associated SNPs are located in gene coding sequences, with the potential to disrupt gene expression or alter of the function of encoded proteins. The remaining 95% of disease-associated SNPs are located in non-coding DNA sequences which make up 98% of the genome. The role of non-coding, disease-associated SNPs, many of which are located at considerable distances from any gene, was at first a mystery until the discovery that gene promoters regularly interact with distal regulatory elements to control gene expression. Disease-associated SNPs are enriched at the millions of gene regulatory elements that are dispersed throughout the non-coding sequences of the genome, suggesting they function as gene regulation variants. Assigning specific regulatory elements to the genes they control is not straightforward since they can be millions of base pairs apart. In this review we describe how understanding 3D genome organization can identify specific interactions between gene promoters and distal regulatory elements and how 3D genomics can link disease-associated SNPs to their target genes. Understanding which gene or genes contribute to a specific disease is the first step in designing rational therapeutic interventions.
Collapse
Affiliation(s)
- Gisela Orozco
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
- NIHR Manchester Biomedical Research Centre, Manchester University Foundation Trust, Manchester, United Kingdom
| | - Stefan Schoenfelder
- Enhanc3D Genomics Ltd., Cambridge, United Kingdom
- Epigenetics Programme, The Babraham Institute, Babraham Research Campus, CB22 3AT Cambridge, Cambridge, United Kingdom
| | | | - Stephan Eyre
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
- NIHR Manchester Biomedical Research Centre, Manchester University Foundation Trust, Manchester, United Kingdom
| | - Peter Fraser
- Enhanc3D Genomics Ltd., Cambridge, United Kingdom
- Department of Biological Science, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
21
|
Vadnais D, Middleton M, Oluwadare O. ParticleChromo3D: a Particle Swarm Optimization algorithm for chromosome 3D structure prediction from Hi-C data. BioData Min 2022; 15:19. [PMID: 36131326 PMCID: PMC9494900 DOI: 10.1186/s13040-022-00305-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 08/31/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
The three-dimensional (3D) structure of chromatin has a massive effect on its function. Because of this, it is desirable to have an understanding of the 3D structural organization of chromatin. To gain greater insight into the spatial organization of chromosomes and genomes and the functions they perform, chromosome conformation capture (3C) techniques, particularly Hi-C, have been developed. The Hi-C technology is widely used and well-known because of its ability to profile interactions for all read pairs in an entire genome. The advent of Hi-C has greatly expanded our understanding of the 3D genome, genome folding, gene regulation and has enabled the development of many 3D chromosome structure reconstruction methods.
Results
Here, we propose a novel approach for 3D chromosome and genome structure reconstruction from Hi-C data using Particle Swarm Optimization (PSO) approach called ParticleChromo3D. This algorithm begins with a grouping of candidate solution locations for each chromosome bin, according to the particle swarm algorithm, and then iterates its position towards a global best candidate solution. While moving towards the optimal global solution, each candidate solution or particle uses its own local best information and a randomizer to choose its path. Using several metrics to validate our results, we show that ParticleChromo3D produces a robust and rigorous representation of the 3D structure for input Hi-C data. We evaluated our algorithm on simulated and real Hi-C data in this work. Our results show that ParticleChromo3D is more accurate than most of the existing algorithms for 3D structure reconstruction.
Conclusions
Our results also show that constructed ParticleChromo3D structures are very consistent, hence indicating that it will always arrive at the global solution at every iteration. The source code for ParticleChromo3D, the simulated and real Hi-C datasets, and the models generated for these datasets are available here: https://github.com/OluwadareLab/ParticleChromo3D
Collapse
|
22
|
Wang J, Chen X, Hou X, Wang J, Yue W, Huang S, Xu G, Yan J, Lu G, Hofreiter M, Li C, Wang C. "Omics" data unveil early molecular response underlying limb regeneration in the Chinese mitten crab, Eriocheir sinensis. SCIENCE ADVANCES 2022; 8:eabl4642. [PMID: 36112682 PMCID: PMC9481118 DOI: 10.1126/sciadv.abl4642] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 08/01/2022] [Indexed: 05/22/2023]
Abstract
Limb regeneration is a fascinating and medically interesting trait that has been well preserved in arthropod lineages, particularly in crustaceans. However, the molecular mechanisms underlying arthropod limb regeneration remain largely elusive. The Chinese mitten crab Eriocheir sinensis shows strong regenerative capacity, a trait that has likely allowed it to become a worldwide invasive species. Here, we report a chromosome-level genome of E. sinensis as well as large-scale transcriptome data during the limb regeneration process. Our results reveal that arthropod-specific genes involved in signal transduction, immune response, histone methylation, and cuticle development all play fundamental roles during the regeneration process. Particularly, Innexin2-mediated signal transduction likely facilitates the early stage of the regeneration process, while an effective crustacean-specific prophenoloxidase system (ProPo-AS) plays crucial roles in the initial immune response. Collectively, our findings uncover novel genetic pathways pertaining to arthropod limb regeneration and provide valuable resources for studies on regeneration from a comparative perspective.
Collapse
Affiliation(s)
- Jun Wang
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Xiaowen Chen
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Xin Hou
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Jingan Wang
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Wucheng Yue
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Shu Huang
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Gangchun Xu
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization certified by the Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Jizhou Yan
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
| | - Guoqing Lu
- Department of Biology, University of Nebraska at Omaha, Omaha, NE 68182, USA
| | - Michael Hofreiter
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
- Corresponding author. Email (M.H.); (C.L.); (C.W.)
| | - Chenhong Li
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
- Corresponding author. Email (M.H.); (C.L.); (C.W.)
| | - Chenghui Wang
- Key Laboratory of Freshwater Aquatic Genetic Resources certified by the Ministry of Agriculture and Rural Affairs/National Demonstration Center for Experimental Fisheries Science Education/Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai 201306, China
- Corresponding author. Email (M.H.); (C.L.); (C.W.)
| |
Collapse
|
23
|
Yu W, Zhong Q, Wen Z, Zhang W, Huang Y. Genome architecture plasticity underlies DNA replication timing dynamics in cell differentiation. Front Genet 2022; 13:961612. [PMID: 36118849 PMCID: PMC9478753 DOI: 10.3389/fgene.2022.961612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 07/15/2022] [Indexed: 12/04/2022] Open
Abstract
During the S-phase of eukaryotic cell cycle, DNA is replicated in a dedicatedly regulated temporal order, with regions containing active and inactive genes replicated early and late, respectively. Recent advances in sequencing technology allow us to explore the connection between replication timing (RT), histone modifications, and three-dimensional (3D) chromatin structure in diverse cell types. To characterize the dynamics during cell differentiation, corresponding sequencing data for human embryonic stem cells and four differentiated cell types were collected. By comparing RT and its extent of conservation before and after germ layer specification, the human genome was partitioned into distinct categories. Each category is then subject to comparisons on genomic, epigenetic, and chromatin 3D structural features. As expected, while constitutive early and late replication regions showed active and inactive features, respectively, dynamic regions with switched RT showed intermediate features. Surprisingly, although early-to-late replication and late-to-early replication regions showed similar histone modification patterns in hESCs, their structural preferences were opposite. Specifically, in hESCs, early-to-late replication regions tended to appear in the B compartment and large topologically associated domains, while late-to-early replication regions showed the opposite. Our results uncover the coordinated regulation of RT and 3D genome structure that underlies the loss of pluripotency and lineage commitment and indicate the importance and potential roles of genome architecture in biological processes.
Collapse
Affiliation(s)
- Wenjun Yu
- Center for Genetics and Developmental Systems Biology, Department of Obstetrics and Gynecology, Nanfang Hospital, Southern Medical University, Guangzhou, China
- *Correspondence: Wenjun Yu,
| | - Quan Zhong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Weihan Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yanrong Huang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
24
|
Zhong W, Liu W, Chen J, Sun Q, Hu M, Li Y. Understanding the function of regulatory DNA interactions in the interpretation of non-coding GWAS variants. Front Cell Dev Biol 2022; 10:957292. [PMID: 36060805 PMCID: PMC9437546 DOI: 10.3389/fcell.2022.957292] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/21/2022] [Indexed: 01/11/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified a vast number of variants associated with various complex human diseases and traits. However, most of these GWAS variants reside in non-coding regions producing no proteins, making the interpretation of these variants a daunting challenge. Prior evidence indicates that a subset of non-coding variants detected within or near cis-regulatory elements (e.g., promoters, enhancers, silencers, and insulators) might play a key role in disease etiology by regulating gene expression. Advanced sequencing- and imaging-based technologies, together with powerful computational methods, enabling comprehensive characterization of regulatory DNA interactions, have substantially improved our understanding of the three-dimensional (3D) genome architecture. Recent literature witnesses plenty of examples where using chromosome conformation capture (3C)-based technologies successfully links non-coding variants to their target genes and prioritizes relevant tissues or cell types. These examples illustrate the critical capability of 3D genome organization in annotating non-coding GWAS variants. This review discusses how 3D genome organization information contributes to elucidating the potential roles of non-coding GWAS variants in disease etiology.
Collapse
Affiliation(s)
- Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co, Inc, Rahway, NJ, United States
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
25
|
Piecyk RS, Schlegel L, Johannes F. Predicting 3D chromatin interactions from DNA sequence using Deep Learning. Comput Struct Biotechnol J 2022; 20:3439-3448. [PMID: 35832620 PMCID: PMC9271978 DOI: 10.1016/j.csbj.2022.06.047] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/22/2022] Open
Abstract
Gene regulation in eukaryotes is profoundly shaped by the 3D organization of chromatin within the cell nucleus. Distal regulatory interactions between enhancers and their target genes are widespread and many causal loci underlying heritable agricultural or clinical traits have been mapped to distal cis-regulatory elements. Dissecting the sequence features that mediate such distal interactions is key to understanding their underlying biology. Deep Learning (DL) models coupled with genome-wide 3C-based sequencing data have emerged as powerful tools to infer the DNA sequence grammar underlying such distal interactions. In this review we show that most DL models have remarkably high prediction accuracy, which indicates that DNA sequence features are important determinants of chromatin looping. However, DL model training has so far been limited to a small set of human cell lines, raising questions about the generalization of these predictions to other tissue-types and species. Furthermore, we find that the model architecture seems less relevant for model performance than the training strategy and the data preparation step. Transfer learning, coupled with functionally curated interactions, appear to be the most promising approach to learn cell-type specific and possibly species- specific sequence features in future applications.
Collapse
Affiliation(s)
- Robert S. Piecyk
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Luca Schlegel
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Frank Johannes
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
- TUM Institute for Advanced Study, Garching, Germany
| |
Collapse
|
26
|
Avdeyev P, Zhou J. Computational Approaches for Understanding Sequence Variation Effects on the 3D Genome Architecture. Annu Rev Biomed Data Sci 2022; 5:183-204. [PMID: 35537461 DOI: 10.1146/annurev-biodatasci-102521-012018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Decoding how genomic sequence and its variations affect 3D genome architecture is indispensable for understanding the genetic architecture of various traits and diseases. The 3D genome organization can be significantly altered by genome variations and in turn impact the function of the genomic sequence. Techniques for measuring the 3D genome architecture across spatial scales have opened up new possibilities for understanding how the 3D genome depends upon the genomic sequence and how it can be altered by sequence variations. Computational methods have become instrumental in analyzing and modeling the sequence effects on 3D genome architecture, and recent development in deep learning sequence models have opened up new opportunities for studying the interplay between sequence variations and the 3D genome. In this review, we focus on computational approaches for both the detection and modeling of sequence variation effects on the 3D genome, and we discuss the opportunities presented by these approaches. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| |
Collapse
|
27
|
Manichaikul A, Lin H, Kang C, Yang C, Rich SS, Taylor KD, Guo X, Rotter JI, Craig Johnson W, Cornell E, Tracy RP, Peter Durda J, Liu Y, Vasan RS, Adrienne Cupples L, Gerszten RE, Clish CB, Jain D, Conomos MP, Blackwell T, Papanicolaou GJ, Rodriguez A. Lymphocyte activation gene-3-associated protein networks are associated with HDL-cholesterol and mortality in the Trans-omics for Precision Medicine program. Commun Biol 2022; 5:362. [PMID: 35501457 PMCID: PMC9061762 DOI: 10.1038/s42003-022-03304-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 03/21/2022] [Indexed: 12/11/2022] Open
Abstract
Deficiency of the immune checkpoint lymphocyte activation gene-3 (LAG3) protein is significantly associated with both elevated HDL-cholesterol (HDL-C) and myocardial infarction risk. We determined the association of genetic variants within ±500 kb of LAG3 with plasma LAG3 and defined LAG3-associated plasma proteins with HDL-C and clinical outcomes. Whole genome sequencing and plasma proteomics were obtained from the Multi-Ethnic Study of Atherosclerosis (MESA) and the Framingham Heart Study (FHS) cohorts as part of the Trans-Omics for Precision Medicine program. In situ Hi-C chromatin capture was performed in EBV-transformed cell lines isolated from four MESA participants. Genetic association analyses were performed in MESA using multivariate regression models, with validation in FHS. A LAG3-associated protein network was tested for association with HDL-C, coronary heart disease, and all-cause mortality. We identify an association between the LAG3 rs3782735 variant and plasma LAG3 protein. Proteomics analysis reveals 183 proteins significantly associated with LAG3 with four proteins associated with HDL-C. Four proteins discovered for association with all-cause mortality in FHS shows nominal associations in MESA. Chromatin capture analysis reveals significant cis interactions between LAG3 and C1S, LRIG3, TNFRSF1A, and trans interactions between LAG3 and B2M. A LAG3-associated protein network has significant associations with HDL-C and mortality.
Collapse
Affiliation(s)
- Ani Manichaikul
- Center for Public Heath Genomics, University of Virginia, Charlottesville, VA, USA
| | - Honghuang Lin
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Chansuk Kang
- Center for Public Heath Genomics, University of Virginia, Charlottesville, VA, USA
| | - Chaojie Yang
- Center for Public Heath Genomics, University of Virginia, Charlottesville, VA, USA
| | - Stephen S Rich
- Center for Public Heath Genomics, University of Virginia, Charlottesville, VA, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | | | | | | | | | - Ramachandran S Vasan
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Robert E Gerszten
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Clary B Clish
- Metabolite Profiling, Broad Institute, Cambridge, MA, USA
| | | | | | | | | | - Annabelle Rodriguez
- Center for Vascular Biology, University of Connecticut Health, Farmington, CT, USA.
| |
Collapse
|
28
|
Aljogol D, Thompson IR, Osborne CS, Mifsud B. Comparison of Capture Hi-C Analytical Pipelines. Front Genet 2022; 13:786501. [PMID: 35198004 PMCID: PMC8859814 DOI: 10.3389/fgene.2022.786501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/03/2022] [Indexed: 11/13/2022] Open
Abstract
It is now evident that DNA forms an organized nuclear architecture, which is essential to maintain the structural and functional integrity of the genome. Chromatin organization can be systematically studied due to the recent boom in chromosome conformation capture technologies (e.g., 3C and its successors 4C, 5C and Hi-C), which is accompanied by the development of computational pipelines to identify biologically meaningful chromatin contacts in such data. However, not all tools are applicable to all experimental designs and all structural features. Capture Hi-C (CHi-C) is a method that uses an intermediate hybridization step to target and select predefined regions of interest in a Hi-C library, thereby increasing effective sequencing depth for those regions. It allows researchers to investigate fine chromatin structures at high resolution, for instance promoter-enhancer loops, but it introduces additional biases with the capture step, and therefore requires specialized pipelines. Here, we compare multiple analytical pipelines for CHi-C data analysis. We consider the effect of retaining multi-mapping reads and compare the efficiency of different statistical approaches in both identifying reproducible interactions and determining biologically significant interactions. At restriction fragment level resolution, the number of multi-mapping reads that could be rescued was negligible. The number of identified interactions varied widely, depending on the analytical method, indicating large differences in type I and type II error rates. The optimal pipeline depends on the project-specific tolerance level of false positive and false negative chromatin contacts.
Collapse
Affiliation(s)
- Dina Aljogol
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - I. Richard Thompson
- Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Cameron S. Osborne
- Department of Medical and Molecular Genetics, King’s College London, London, United Kingdom
| | - Borbala Mifsud
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- *Correspondence: Borbala Mifsud,
| |
Collapse
|
29
|
Ho D, Schierding W, Farrow SL, Cooper AA, Kempa-Liehr AW, O’Sullivan JM. Machine Learning Identifies Six Genetic Variants and Alterations in the Heart Atrial Appendage as Key Contributors to PD Risk Predictivity. Front Genet 2022; 12:785436. [PMID: 35047012 PMCID: PMC8762216 DOI: 10.3389/fgene.2021.785436] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 11/09/2021] [Indexed: 12/14/2022] Open
Abstract
Parkinson's disease (PD) is a complex neurodegenerative disease with a range of causes and clinical presentations. Over 76 genetic loci (comprising 90 SNPs) have been associated with PD by the most recent GWAS meta-analysis. Most of these PD-associated variants are located in non-coding regions of the genome and it is difficult to understand what they are doing and how they contribute to the aetiology of PD. We hypothesised that PD-associated genetic variants modulate disease risk through tissue-specific expression quantitative trait loci (eQTL) effects. We developed and validated a machine learning approach that integrated tissue-specific eQTL data on known PD-associated genetic variants with PD case and control genotypes from the Wellcome Trust Case Control Consortium. In so doing, our analysis ranked the tissue-specific transcription effects for PD-associated genetic variants and estimated their relative contributions to PD risk. We identified roles for SNPs that are connected with INPP5P, CNTN1, GBA and SNCA in PD. Ranking the variants and tissue-specific eQTL effects contributing most to the machine learning model suggested a key role in the risk of developing PD for two variants (rs7617877 and rs6808178) and eQTL associated transcriptional changes of EAF1-AS1 within the heart atrial appendage. Similarly, effects associated with eQTLs located within the Brain Cerebellum were also recognized to confer major PD risk. These findings were replicated in two additional, independent cohorts (the UK Biobank, and NeuroX) and thus warrant further mechanistic investigations to determine if these transcriptional changes could act as early contributors to PD risk and disease development.
Collapse
Affiliation(s)
- Daniel Ho
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - William Schierding
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
| | - Sophie L. Farrow
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
| | - Antony A. Cooper
- Australian Parkinsons Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent’s Clinical School, UNSW Sydney, Sydney, NSW, Australia
| | | | - Justin M. O’Sullivan
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
- Brain Research New Zealand, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
30
|
Georgolopoulos G, Psatha N, Iwata M, Nishida A, Som T, Yiangou M, Stamatoyannopoulos JA, Vierstra J. Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation. Nat Commun 2021; 12:6790. [PMID: 34815405 PMCID: PMC8611072 DOI: 10.1038/s41467-021-27159-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 10/20/2021] [Indexed: 11/08/2022] Open
Abstract
Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we perform dense, daily, temporal profiling of chromatin accessibility (DNase I-seq) and gene expression changes (total RNA-seq) along ex vivo human erythropoiesis to comprehensively define developmentally regulated DNase I hypersensitive sites (DHSs) and transcripts. We link both distal DHSs to their target gene promoters and individual TFs to their target DHSs, revealing that the regulatory landscape is organized in distinct sequential regulatory modules that regulate lineage restriction and maturation. Finally, direct comparison of transcriptional dynamics (bulk and single-cell) and lineage potential between erythropoiesis and megakaryopoiesis uncovers differential fate commitment dynamics between the two lineages as they exit the stem and progenitor stage. Collectively, these data provide insights into the temporally regulated synergy of the cis- and the trans-regulatory components underlying hematopoietic lineage commitment and differentiation.
Collapse
Affiliation(s)
- Grigorios Georgolopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA.
- Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | | | - Mineo Iwata
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Tannishtha Som
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Minas Yiangou
- Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - John A Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Division of Oncology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jeff Vierstra
- Altius Institute for Biomedical Sciences, Seattle, WA, USA.
| |
Collapse
|
31
|
Maslova A, Krasikova A. FISH Going Meso-Scale: A Microscopic Search for Chromatin Domains. Front Cell Dev Biol 2021; 9:753097. [PMID: 34805161 PMCID: PMC8597843 DOI: 10.3389/fcell.2021.753097] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 10/08/2021] [Indexed: 12/14/2022] Open
Abstract
The intimate relationships between genome structure and function direct efforts toward deciphering three-dimensional chromatin organization within the interphase nuclei at different genomic length scales. For decades, major insights into chromatin structure at the level of large-scale euchromatin and heterochromatin compartments, chromosome territories, and subchromosomal regions resulted from the evolution of light microscopy and fluorescence in situ hybridization. Studies of nanoscale nucleosomal chromatin organization benefited from a variety of electron microscopy techniques. Recent breakthroughs in the investigation of mesoscale chromatin structures have emerged from chromatin conformation capture methods (C-methods). Chromatin has been found to form hierarchical domains with high frequency of local interactions from loop domains to topologically associating domains and compartments. During the last decade, advances in super-resolution light microscopy made these levels of chromatin folding amenable for microscopic examination. Here we are reviewing recent developments in FISH-based approaches for detection, quantitative measurements, and validation of contact chromatin domains deduced from C-based data. We specifically focus on the design and application of Oligopaint probes, which marked the latest progress in the imaging of chromatin domains. Vivid examples of chromatin domain FISH-visualization by means of conventional, super-resolution light and electron microscopy in different model organisms are provided.
Collapse
Affiliation(s)
| | - Alla Krasikova
- Laboratory of Nuclear Structure and Dynamics, Cytology and Histology Department, Saint Petersburg State University, Saint Petersburg, Russia
| |
Collapse
|
32
|
Salviato E, Djordjilović V, Hariprakash JM, Tagliaferri I, Pal K, Ferrari F. Leveraging three-dimensional chromatin architecture for effective reconstruction of enhancer-target gene regulatory interactions. Nucleic Acids Res 2021; 49:e97. [PMID: 34197622 PMCID: PMC8464068 DOI: 10.1093/nar/gkab547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 06/07/2021] [Accepted: 06/17/2021] [Indexed: 12/23/2022] Open
Abstract
A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer-target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Vera Djordjilović
- Department of Economics, Ca’ Foscari University of Venice, Venice 30100, Italy
| | | | | | - Koustav Pal
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Francesco Ferrari
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
- Institute of Molecular Genetics “Luigi Luca Cavalli-Sforza”, National Research Council, Pavia 27100, Italy
| |
Collapse
|
33
|
Strunz T, Kellner M, Kiel C, Weber BHF. Assigning Co-Regulated Human Genes and Regulatory Gene Clusters. Cells 2021; 10:2395. [PMID: 34572044 PMCID: PMC8470523 DOI: 10.3390/cells10092395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 08/30/2021] [Accepted: 09/10/2021] [Indexed: 12/12/2022] Open
Abstract
Elucidating the role of genetic variation in the regulation of gene expression is key to understanding the pathobiology of complex diseases which, in consequence, is crucial in devising targeted treatment options. Expression quantitative trait locus (eQTL) analysis correlates a genetic variant with the strength of gene expression, thus defining thousands of regulated genes in a multitude of human cell types and tissues. Some eQTL may not act independently of each other but instead may be regulated in a coordinated fashion by seemingly independent genetic variants. To address this issue, we combined the approaches of eQTL analysis and colocalization studies. Gene expression was determined in datasets comprising 49 tissues from the Genotype-Tissue Expression (GTEx) project. From about 33,000 regulated genes, over 14,000 were found to be co-regulated in pairs and were assembled across all tissues to almost 15,000 unique clusters containing up to nine regulated genes affected by the same eQTL signal. The distance of co-regulated eGenes was, on average, 112 kilobase pairs. Of 713 genes known to express clinical symptoms upon haploinsufficiency, 231 (32.4%) are part of at least one of the identified clusters. This calls for caution should treatment approaches aim at an upregulation of a haploinsufficient gene. In conclusion, we present an unbiased approach to identifying co-regulated genes in and across multiple tissues. Knowledge of such common effects is crucial to appreciate implications on biological pathways involved, specifically when a treatment option targets a co-regulated disease gene.
Collapse
Affiliation(s)
- Tobias Strunz
- Institute of Human Genetics, University of Regensburg, 93053 Regensburg, Germany; (T.S.); (M.K.); (C.K.)
| | - Martin Kellner
- Institute of Human Genetics, University of Regensburg, 93053 Regensburg, Germany; (T.S.); (M.K.); (C.K.)
| | - Christina Kiel
- Institute of Human Genetics, University of Regensburg, 93053 Regensburg, Germany; (T.S.); (M.K.); (C.K.)
| | - Bernhard H. F. Weber
- Institute of Human Genetics, University of Regensburg, 93053 Regensburg, Germany; (T.S.); (M.K.); (C.K.)
- Institute of Clinical Human Genetics, University Hospital Regensburg, 93053 Regensburg, Germany
| |
Collapse
|
34
|
MacPhillamy C, Pitchford WS, Alinejad-Rokny H, Low WY. Opportunity to improve livestock traits using 3D genomics. Anim Genet 2021; 52:785-798. [PMID: 34494283 DOI: 10.1111/age.13135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2021] [Indexed: 11/30/2022]
Abstract
The advent of high-throughput chromosome conformation capture and sequencing (Hi-C) has enabled researchers to probe the 3D architecture of the mammalian genome in a genome-wide manner. Simultaneously, advances in epigenomic assays, such as chromatin immunoprecipitation and sequencing (ChIP-seq) and DNase-seq, have enabled researchers to study cis-regulatory interactions and chromatin accessibility across the same genome-wide scale. The use of these data has revealed many unique insights into gene regulation and disease pathomechanisms in several model organisms. With the advent of these high-throughput sequencing technologies, there has been an ever-increasing number of datasets available for study; however, this is often limited to model organisms. Livestock species play critical roles in the economies of developing and developed nations alike. Despite this, they are greatly underrepresented in the 3D genomics space; Hi-C and related technologies have the potential to revolutionise livestock breeding by enabling a more comprehensive understanding of how production traits are controlled. The growth in human and model organism Hi-C data has seen a surge in the availability of computational tools for use in 3D genomics, with some tools using machine learning techniques to predict features and improve dataset quality. In this review, we provide an overview of the 3D genome and discuss the status of 3D genomics in livestock before delving into advancing the field by drawing inspiration from research in human and mouse. We end by offering future directions for livestock research in the field of 3D genomics.
Collapse
Affiliation(s)
- C MacPhillamy
- Davies Livestock Research Centre, The University of Adelaide, Roseworthy Campus, Mudla Wirra Rd, Roseworthy, SA, 5371, Australia
| | - W S Pitchford
- Davies Livestock Research Centre, The University of Adelaide, Roseworthy Campus, Mudla Wirra Rd, Roseworthy, SA, 5371, Australia
| | - H Alinejad-Rokny
- Biological & Medical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia.,School of Computer Science and Engineering, The University of New South Wales (UNSW Sydney), Sydney, NSW, 2052, Australia
| | - W Y Low
- Davies Livestock Research Centre, The University of Adelaide, Roseworthy Campus, Mudla Wirra Rd, Roseworthy, SA, 5371, Australia
| |
Collapse
|
35
|
Cao F, Zhang Y, Cai Y, Animesh S, Zhang Y, Akincilar SC, Loh YP, Li X, Chng WJ, Tergaonkar V, Kwoh CK, Fullwood MJ. Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences. Genome Biol 2021; 22:226. [PMID: 34399797 PMCID: PMC8365954 DOI: 10.1186/s13059-021-02453-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/04/2021] [Indexed: 11/10/2022] Open
Abstract
Chromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is limited. We develop a computational method, chromatin interaction neural network (ChINN), to predict chromatin interactions between open chromatin regions using only DNA sequences. ChINN predicts CTCF- and RNA polymerase II-associated and Hi-C chromatin interactions. ChINN shows good across-sample performances and captures various sequence features for chromatin interaction prediction. We apply ChINN to 6 chronic lymphocytic leukemia (CLL) patient samples and a published cohort of 84 CLL open chromatin samples. Our results demonstrate extensive heterogeneity in chromatin interactions among CLL patient samples.
Collapse
Affiliation(s)
- Fan Cao
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, Block N4, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Yichao Cai
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Sambhavi Animesh
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Ying Zhang
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Semih Can Akincilar
- Institute of Molecular and Cell Biology, Agency for Science (IMCB), A*STAR (Agency for Science, Technology and Research,, Singapore, 138673 Singapore
| | - Yan Ping Loh
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Xinya Li
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Wee Joo Chng
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, 1E Kent Ridge Road, Singapore, 119228 Singapore
- Department of Haematology-Oncology, National University Cancer Institute, National University Health System, NUH Zone B, Medical Centre, Singapore, 119074 Singapore
| | - Vinay Tergaonkar
- Institute of Molecular and Cell Biology, Agency for Science (IMCB), A*STAR (Agency for Science, Technology and Research,, Singapore, 138673 Singapore
- Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore, 117597 Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Block N4, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Melissa J. Fullwood
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
- Institute of Molecular and Cell Biology, Agency for Science (IMCB), A*STAR (Agency for Science, Technology and Research,, Singapore, 138673 Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| |
Collapse
|
36
|
Jerkovic I, Cavalli G. Understanding 3D genome organization by multidisciplinary methods. Nat Rev Mol Cell Biol 2021; 22:511-528. [PMID: 33953379 DOI: 10.1038/s41580-021-00362-w] [Citation(s) in RCA: 197] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2021] [Indexed: 02/03/2023]
Abstract
Understanding how chromatin is folded in the nucleus is fundamental to understanding its function. Although 3D genome organization has been historically difficult to study owing to a lack of relevant methodologies, major technological breakthroughs in genome-wide mapping of chromatin contacts and advances in imaging technologies in the twenty-first century considerably improved our understanding of chromosome conformation and nuclear architecture. In this Review, we discuss methods of 3D genome organization analysis, including sequencing-based techniques, such as Hi-C and its derivatives, Micro-C, DamID and others; microscopy-based techniques, such as super-resolution imaging coupled with fluorescence in situ hybridization (FISH), multiplex FISH, in situ genome sequencing and live microscopy methods; and computational and modelling approaches. We describe the most commonly used techniques and their contribution to our current knowledge of nuclear architecture and, finally, we provide a perspective on up-and-coming methods that open possibilities for future major discoveries.
Collapse
Affiliation(s)
- Ivana Jerkovic
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France
| | - Giacomo Cavalli
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France.
| |
Collapse
|
37
|
Abstract
Alignments of discrete objects can be constructed in a very general setting as super-objects from which the constituent objects are recovered by means of projections. Here, we focus on contact maps, i.e. undirected graphs with an ordered set of vertices. These serve as natural discretizations of RNA and protein structures. In the general case, the alignment problem for vertex-ordered graphs is NP-complete. In the special case of RNA secondary structures, i.e. crossing-free matchings, however, the alignments have a recursive structure. The alignment problem then can be solved by a variant of the Sankoff algorithm in polynomial time. Moreover, the tree or forest alignments of RNA secondary structure can be understood as the alignments of ordered edge sets.
Collapse
Affiliation(s)
- Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Centre for Scalable Data Services and Solutions Dresden-Leipzig, Leipzig Research Centre for Civilization Diseases, and Centre for Biotechnology and Biomedicine at Leipzig University, Universität Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Wien, Austria.,Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia.,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| |
Collapse
|
38
|
Wu H, Wang X, Chu M, Li D, Cheng L, Zhou K. HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data. Comput Struct Biotechnol J 2021; 19:2637-2645. [PMID: 34025950 PMCID: PMC8120939 DOI: 10.1016/j.csbj.2021.04.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/11/2021] [Accepted: 04/24/2021] [Indexed: 11/17/2022] Open
Abstract
The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.
Collapse
Affiliation(s)
- Honglong Wu
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Xuebin Wang
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Mengtian Chu
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Dongfang Li
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Ke Zhou
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
| |
Collapse
|
39
|
Dozmorov MG, Tyc KM, Sheffield NC, Boyd DC, Olex AL, Reed J, Harrell JC. Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines. Gigascience 2021; 10:giab022. [PMID: 33880552 PMCID: PMC8058593 DOI: 10.1093/gigascience/giab022] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 01/14/2021] [Accepted: 03/09/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Sequencing of patient-derived xenograft (PDX) mouse models allows investigation of the molecular mechanisms of human tumor samples engrafted in a mouse host. Thus, both human and mouse genetic material is sequenced. Several methods have been developed to remove mouse sequencing reads from RNA-seq or exome sequencing PDX data and improve the downstream signal. However, for more recent chromatin conformation capture technologies (Hi-C), the effect of mouse reads remains undefined. RESULTS We evaluated the effect of mouse read removal on the quality of Hi-C data using in silico created PDX Hi-C data with 10% and 30% mouse reads. Additionally, we generated 2 experimental PDX Hi-C datasets using different library preparation strategies. We evaluated 3 alignment strategies (Direct, Xenome, Combined) and 3 pipelines (Juicer, HiC-Pro, HiCExplorer) on Hi-C data quality. CONCLUSIONS Removal of mouse reads had little-to-no effect on data quality as compared with the results obtained with the Direct alignment strategy. Juicer extracted more valid chromatin interactions for Hi-C matrices, regardless of the mouse read removal strategy. However, the pipeline effect was minimal, while the library preparation strategy had the largest effect on all quality metrics. Together, our study presents comprehensive guidelines on PDX Hi-C data processing.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Katarzyna M Tyc
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - David C Boyd
- Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA
- Integrative Life Sciences Doctoral Program, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Amy L Olex
- C. Kenneth and Dianne Wright Center for Clinical and Translational Research, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Jason Reed
- Virginia Commonwealth University, Massey Cancer Center, Richmond, VA, 23298, USA
- Department of Physics, Virginia Commonwealth University, Richmond, VA 23220, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
40
|
Messelink JJB, van Teeseling MCF, Janssen J, Thanbichler M, Broedersz CP. Learning the distribution of single-cell chromosome conformations in bacteria reveals emergent order across genomic scales. Nat Commun 2021; 12:1963. [PMID: 33785756 PMCID: PMC8010069 DOI: 10.1038/s41467-021-22189-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 02/15/2021] [Indexed: 02/01/2023] Open
Abstract
The order and variability of bacterial chromosome organization, contained within the distribution of chromosome conformations, are unclear. Here, we develop a fully data-driven maximum entropy approach to extract single-cell 3D chromosome conformations from Hi-C experiments on the model organism Caulobacter crescentus. The predictive power of our model is validated by independent experiments. We find that on large genomic scales, organizational features are predominantly present along the long cell axis: chromosomal loci exhibit striking long-ranged two-point axial correlations, indicating emergent order. This organization is associated with large genomic clusters we term Super Domains (SuDs), whose existence we support with super-resolution microscopy. On smaller genomic scales, our model reveals chromosome extensions that correlate with transcriptional and loop extrusion activity. Finally, we quantify the information contained in chromosome organization that may guide cellular processes. Our approach can be extended to other species, providing a general strategy to resolve variability in single-cell chromosomal organization.
Collapse
Affiliation(s)
- Joris J B Messelink
- Arnold Sommerfeld Center for Theoretical Physics and Center for NanoScience, Department of Physics, Ludwig Maximilian University Munich, Munich, Germany
| | - Muriel C F van Teeseling
- Department of Biology, University of Marburg, Marburg, Germany
- Prokaryotic Cell Biology Group, Department of Microbial Interactions, Institute for Microbiology, Friedrich Schiller University Jena, Jena, Germany
| | - Jacqueline Janssen
- Arnold Sommerfeld Center for Theoretical Physics and Center for NanoScience, Department of Physics, Ludwig Maximilian University Munich, Munich, Germany
| | - Martin Thanbichler
- Department of Biology, University of Marburg, Marburg, Germany
- Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Marburg, Germany
| | - Chase P Broedersz
- Arnold Sommerfeld Center for Theoretical Physics and Center for NanoScience, Department of Physics, Ludwig Maximilian University Munich, Munich, Germany.
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
41
|
Xia WX, Li HR, Ge JH, Liu YW, Li HH, Su YH, Wang HZ, Guo HF, Dai YX, Liu YW, Gou XC. High-continuity genome assembly of the jellyfish Chrysaora quinquecirrha. Zool Res 2021; 42:130-134. [PMID: 33377334 PMCID: PMC7840447 DOI: 10.24272/j.issn.2095-8137.2020.258] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Atlantic sea nettle (Chrysaora quinquecirrha) has an important evolutionary position due to its high ecological value. However, due to limited sequencing technologies and complex jellyfish genomic sequences, the current C. quinquecirrha genome assembly is highly fragmented. Here, we used the most advanced high-throughput chromosome conformation capture (Hi-C) technology to obtain high-coverage sequencing data of the C. quinquecirrha genome. We then anchored these data to the previously published contig-level assembly to improve the genome. Finally, a high-continuity genome sequence of C. quinquecirrha was successfully assembled, which contained 1 882 scaffolds with a N50 length of 3.83 Mb. The N50 length of the genome assembly was 5.23 times longer than the previously released one, and additional analysis revealed that it had a high degree of genomic continuity and accuracy. Acquisition of the high-continuity genome sequence of C. quinquecirrha not only provides a basis for the study of jellyfish evolution through comparative genomics but also provides an important resource for studies on jellyfish growth and development.
Collapse
Affiliation(s)
- Wang-Xiao Xia
- Shaanxi Key Laboratory of Brain Disorders, Institute of Basic Translational Medicine, Xi'an Medical University, Xi'an, Shaanxi 710021, China
| | - Hao-Rong Li
- Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Jing-Hao Ge
- Shaanxi Key Laboratory of Brain Disorders, Institute of Basic Translational Medicine, Xi'an Medical University, Xi'an, Shaanxi 710021, China
| | - Yao-Wu Liu
- ZhiQiao Research Institute, Changsha, Hunan 410000, China
| | - Hong-Hui Li
- Key Laboratory of Animal Gene Editing and Animal Cloning in Yunnan Province, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Yan-Hua Su
- Key Laboratory of Animal Gene Editing and Animal Cloning in Yunnan Province, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Hai-Zhen Wang
- Key Laboratory of Animal Gene Editing and Animal Cloning in Yunnan Province, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Hui-Fang Guo
- Shaanxi Key Laboratory of Infection and Immune Disorders, School of Basic Medical Science, Xi'an Medical University, Xi'an, Shaanxi 710021, China
| | - Yu-Xuan Dai
- Shaanxi Key Laboratory of Brain Disorders, Institute of Basic Translational Medicine, Xi'an Medical University, Xi'an, Shaanxi 710021, China
| | - Yao-Wen Liu
- Key Laboratory of Animal Gene Editing and Animal Cloning in Yunnan Province, Yunnan Agricultural University, Kunming, Yunnan 650201, China. E-mail:
| | - Xing-Chun Gou
- Shaanxi Key Laboratory of Brain Disorders, Institute of Basic Translational Medicine, Xi'an Medical University, Xi'an, Shaanxi 710021, China. E-mail:
| |
Collapse
|
42
|
Gallardo-Escárate C, Valenzuela-Muñoz V, Nuñez-Acuña G, Valenzuela-Miranda D, Gonçalves AT, Escobar-Sepulveda H, Liachko I, Nelson B, Roberts S, Warren W. Chromosome-scale genome assembly of the sea louse Caligus rogercresseyi by SMRT sequencing and Hi-C analysis. Sci Data 2021; 8:60. [PMID: 33574331 PMCID: PMC7878743 DOI: 10.1038/s41597-021-00842-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 01/25/2021] [Indexed: 12/19/2022] Open
Abstract
Caligus rogercresseyi, commonly known as sea louse, is an ectoparasite copepod that impacts the salmon aquaculture in Chile, causing losses of hundreds of million dollars per year. In this study, we report a chromosome-scale assembly of the sea louse (C. rogercresseyi) genome based on single-molecule real-time sequencing (SMRT) and proximity ligation (Hi-C) analysis. Coding RNAs and non-coding RNAs, and specifically long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) were identified through whole transcriptome sequencing from different life stages. A total of 23,686 protein-coding genes and 12,558 non-coding RNAs were annotated. In addition, 6,308 lncRNAs and 5,774 miRNAs were found to be transcriptionally active from larvae to adult stages. Taken together, this genomic resource for C. rogercresseyi represents a valuable tool to develop sustainable control strategies in the salmon aquaculture industry.
Collapse
Affiliation(s)
- Cristian Gallardo-Escárate
- Interdisciplinary Center for Aquaculture Research, University of Concepción, Concepción, Chile.
- Laboratory of Biotechnology and Aquatic Genomics, Center of Biotechnology, University of Concepción, Concepción, Chile.
| | - Valentina Valenzuela-Muñoz
- Interdisciplinary Center for Aquaculture Research, University of Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Center of Biotechnology, University of Concepción, Concepción, Chile
| | - Gustavo Nuñez-Acuña
- Interdisciplinary Center for Aquaculture Research, University of Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Center of Biotechnology, University of Concepción, Concepción, Chile
| | - Diego Valenzuela-Miranda
- Interdisciplinary Center for Aquaculture Research, University of Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Center of Biotechnology, University of Concepción, Concepción, Chile
| | - Ana Teresa Gonçalves
- Interdisciplinary Center for Aquaculture Research, University of Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Center of Biotechnology, University of Concepción, Concepción, Chile
| | - Hugo Escobar-Sepulveda
- Interdisciplinary Center for Aquaculture Research, University of Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Center of Biotechnology, University of Concepción, Concepción, Chile
| | | | | | - Steven Roberts
- School of Aquatic and Fishery Sciences (SAFS), University of Washington, Seattle, USA
| | - Wesley Warren
- Bond Life Sciences Center, University of Missouri, Columbia, USA
| |
Collapse
|
43
|
Kruse K, Hug CB, Vaquerizas JM. FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data. Genome Biol 2020; 21:303. [PMID: 33334380 PMCID: PMC7745377 DOI: 10.1186/s13059-020-02215-9] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/30/2020] [Indexed: 01/01/2023] Open
Abstract
Chromosome conformation capture data, particularly from high-throughput approaches such as Hi-C, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we present FAN-C, an easy-to-use command-line tool and powerful Python API with a broad feature set covering matrix generation, analysis, and visualisation for C-like data ( https://github.com/vaquerizaslab/fanc ). Due to its compatibility with the most prevalent Hi-C storage formats, FAN-C can be used in combination with a large number of existing analysis tools, thus greatly simplifying Hi-C matrix analysis.
Collapse
Affiliation(s)
- Kai Kruse
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, 48149, Muenster, Germany
| | - Clemens B Hug
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, 48149, Muenster, Germany
| | - Juan M Vaquerizas
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, 48149, Muenster, Germany.
- MRC London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Du Cane Road, London, W12 0NN, UK.
| |
Collapse
|
44
|
Magnitov MD, Kuznetsova VS, Ulianov SV, Razin SV, Tyakht AV. Benchmark of software tools for prokaryotic chromosomal interaction domain identification. Bioinformatics 2020; 36:4560-4567. [PMID: 32492116 PMCID: PMC7653553 DOI: 10.1093/bioinformatics/btaa555] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 05/26/2020] [Accepted: 05/29/2020] [Indexed: 01/01/2023] Open
Abstract
Motivation The application of genome-wide chromosome conformation capture (3C) methods to prokaryotes provided insights into the spatial organization of their genomes and identified patterns conserved across the tree of life, such as chromatin compartments and contact domains. Prokaryotic genomes vary in GC content and the density of restriction sites along the chromosome, suggesting that these properties should be considered when planning experiments and choosing appropriate software for data processing. Diverse algorithms are available for the analysis of eukaryotic chromatin contact maps, but their potential application to prokaryotic data has not yet been evaluated. Results Here, we present a comparative analysis of domain calling algorithms using available single-microbe experimental data. We evaluated the algorithms’ intra-dataset reproducibility, concordance with other tools and sensitivity to coverage and resolution of contact maps. Using RNA-seq as an example, we showed how orthogonal biological data can be utilized to validate the reliability and significance of annotated domains. We also suggest that in silico simulations of contact maps can be used to choose optimal restriction enzymes and estimate theoretical map resolutions before the experiment. Our results provide guidelines for researchers investigating microbes and microbial communities using high-throughput 3C assays such as Hi-C and 3C-seq. Availability and implementation The code of the analysis is available at https://github.com/magnitov/prokaryotic_cids. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikhail D Magnitov
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine.,Group of Genome Spatial Organization, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.,Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny 141700, Russia
| | - Veronika S Kuznetsova
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny 141700, Russia.,Group of Bioinformatics
| | - Sergey V Ulianov
- Laboratory of Structural and Functional Organization of Chromosomes, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.,Department of Biology, Moscow State University, Moscow 119234, Russia
| | - Sergey V Razin
- Laboratory of Structural and Functional Organization of Chromosomes, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.,Department of Biology, Moscow State University, Moscow 119234, Russia
| | - Alexander V Tyakht
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine.,Group of Bioinformatics
| |
Collapse
|
45
|
Dynamic Chromatin Structure and Epigenetics Control the Fate of Malaria Parasites. Trends Genet 2020; 37:73-85. [PMID: 32988634 DOI: 10.1016/j.tig.2020.09.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 08/27/2020] [Accepted: 09/02/2020] [Indexed: 12/11/2022]
Abstract
Multiple hosts and various life cycle stages prompt the human malaria parasite, Plasmodium falciparum, to acquire sophisticated molecular mechanisms to ensure its survival, spread, and transmission to its next host. To face these environmental challenges, increasing evidence suggests that the parasite has developed complex and complementary layers of regulatory mechanisms controlling gene expression. Here, we discuss the recent developments in the discovery of molecular components that contribute to cell replication and differentiation and highlight the major contributions of epigenetics, transcription factors, and nuclear architecture in controlling gene regulation and life cycle progression in Plasmodium spp.
Collapse
|
46
|
Jiang K, Kessler H, Park Y, Sudman M, Thompson SD, Jarvis JN. Broadening our understanding of the genetics of Juvenile Idiopathic Arthritis (JIA): Interrogation of three dimensional chromatin structures and genetic regulatory elements within JIA-associated risk loci. PLoS One 2020; 15:e0235857. [PMID: 32730263 PMCID: PMC7392255 DOI: 10.1371/journal.pone.0235857] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 06/23/2020] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVE The risk loci for juvenile idiopathic arthritis (JIA) consist of extended haplotypes that include functional elements in addition to canonical coding genes. As with most autoimmune diseases, the risk haplotypes for JIA are highly enriched for H3K4me1/H3K27ac histone marks, epigenetic signatures that typically identify poised or active enhancers. In this study, we test the hypothesis that genetic risk for JIA is exerted through altered enhancer-mediated gene regulation. METHODS We mined publically available HiC and other chromatin conformation data to determine whether H3K27ac-marked regions in 25 JIA risk loci showed physical evidence of contact with gene promoters. We also used in vitro reporter assays to establish as proof-of-concept the idea that genetic variants in linkage disequilibrium with GWAS-identified tag SNPs alter enhancer function. RESULTS All 25 loci examined showed multiple contact sites in the 4 different cell lines that we queried. These regions were characterized by HiC-defined loop structures that included 237 immune-related genes. Using in vitro assays, we found that a 657 bp, H3K4me1/H3K27-marked region within the first intron of IL2RA shows enhancer activity in reporter assays, and this activity is attenuated by SNPs on the IL2RA haplotype that we identified using whole genome sequencing of children with JIA. Similarly, we identified a 1,669 bp sequence in an intergenic region of the IL6R locus where SNPs identified in children with JIA increase enhancer function in reporter assays. CONCLUSIONS These studies provide evidence that altered enhancer function contributes to genetic risk in JIA. Further studies to identify the specific target genes of genetically altered enhancers are warranted.
Collapse
Affiliation(s)
- Kaiyu Jiang
- Department of Pediatrics, Pediatric Rheumatology Research, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, New York, United States of America
| | - Haeja Kessler
- Department of Pediatrics, Pediatric Rheumatology Research, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, New York, United States of America
| | - Yungki Park
- Department of Biochemistry, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, New York, United States of America
- Genetics, Genomics, & Bioinformatics Program, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, New York, United States of Americass
| | - Marc Sudman
- Center for Autoimmune Genetics & Epigenetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Susan D. Thompson
- Center for Autoimmune Genetics & Epigenetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - James N. Jarvis
- Department of Pediatrics, Pediatric Rheumatology Research, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, New York, United States of America
- Genetics, Genomics, & Bioinformatics Program, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, New York, United States of Americass
| |
Collapse
|
47
|
Ma W, Gu C, Ma L, Fan C, Zhang C, Sun Y, Li C, Yang G. Mixed secondary chromatin structure revealed by modeling radiation-induced DNA fragment length distribution. SCIENCE CHINA-LIFE SCIENCES 2020; 63:825-834. [PMID: 32279284 DOI: 10.1007/s11427-019-1638-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 01/20/2020] [Indexed: 10/24/2022]
Abstract
Spatial chromatin structure plays fundamental roles in many vital biological processes including DNA replication, transcription, damage and repair. However, the current understanding of the secondary structure of chromatin formed by local nucleosome-nucleosome interactions remains controversial, especially for the existence and conformation of 30 nm structure. Since chromatin structure influences the fragment length distribution (FLD) of ionizing radiation-induced DNA strand breaks, a 3D chromatin model fitting FLD patterns can help to distinguish different models of chromatin structure. Here, we developed a novel "30-C" model combining 30 nm chromatin structure models with Hi-C data, which measured the spatial contact frequency between different loci in the genome. We first reconstructed the 3D coordinates of the 25 kb bins from Hi-C heatmaps. Within the 25 kb bins, lower level chromatin structures supported by recent studies were filled. Simulated FLD patterns based on the 30-C model were compared to published FLD patterns induced by heavy ion radiation to validate the models. Importantly, the 30-C model predicted that the most probable chromatin fiber structure for human interphase fibroblasts in vivo was 45% zig-zag 30 nm fibers and 55% 10 nm fibers.
Collapse
Affiliation(s)
- Wenzong Ma
- State Key Laboratory of Nuclear Physics and Technology, School of Physics, Peking University, Beijing, 100871, China
| | - Chenyang Gu
- State Key Laboratory of Nuclear Physics and Technology, School of Physics, Peking University, Beijing, 100871, China.,Radiation Biology Center, Graduate School of Biostudies, Kyoto University, Kyoto, 606-8501, Japan
| | - Lin Ma
- State Key Laboratory of Nuclear Physics and Technology, School of Physics, Peking University, Beijing, 100871, China.,Medical Artificial Intelligence and Automation, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Caoqi Fan
- Center for Bioinformatics, School of Life Sciences and Center for Statistical Science, Peking University, Beijing, 100871, China
| | - Chao Zhang
- Center for Bioinformatics, School of Life Sciences and Center for Statistical Science, Peking University, Beijing, 100871, China
| | - Yujie Sun
- State Key Laboratory of Membrane Biology, School of Life Sciences, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, 100871, China
| | - Cheng Li
- Center for Bioinformatics, School of Life Sciences and Center for Statistical Science, Peking University, Beijing, 100871, China.
| | - Gen Yang
- State Key Laboratory of Nuclear Physics and Technology, School of Physics, Peking University, Beijing, 100871, China.
| |
Collapse
|
48
|
Das P, Golloshi R, McCord RP, Shen T. Using contact statistics to characterize structure transformation of biopolymer ensembles. Phys Rev E 2020; 101:012419. [PMID: 32069653 PMCID: PMC7329163 DOI: 10.1103/physreve.101.012419] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Indexed: 12/20/2022]
Abstract
As a unique subset of functional polymers, many biopolymers have a set of well-defined three-dimensional (3D) structural characteristics that can be described by spatial contacts between monomers. Statistical analysis of the contacts has been extremely productive in characterizing the biopolymer structural ensemble, such as for 3D chromosome structures. Often, native contacts and compartment structures are the focus of the studies, while the generic polymer aspect, such as the overall decaying of contacts with increasing sequence distance, is analyzed separately or preemptively removed. Here, we explore insights that can be gained by performing "compartment analysis" that keeps the distance decay, which we believe is particularly useful for characterizing the structure transformation of biopolymers. We tested contact analysis on several such transformations under physical perturbation or biological processes, including (1) unfolding of proteins induced by thermal denaturation, (2) chromosome conformation transition during the cell cycle, and (3) chromosome unpacking by physicochemical perturbations. Useful score functions were developed to further quantitatively characterize the transformation judging from the contact analysis. We also find that the sinusoidal undertone of eigenvector patterns (the "unwanted," low frequency signal, in contrast to the detailed A/B compartment) that had previously been attributed to biological effects of centromere proximal and distal interactions may in fact reflect a universal feature of polymers that have relatively weaker long-range contacts.
Collapse
Affiliation(s)
- Priyojit Das
- UT-ORNL Graduate School of Genome Science and Technology, Knoxville, Tennessee 37996, USA
| | - Rosela Golloshi
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996, USA
| | - Rachel Patton McCord
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996, USA
| | - Tongye Shen
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996, USA
| |
Collapse
|
49
|
2019-A year in Biophysical Reviews. Biophys Rev 2019; 11:833-839. [PMID: 31741173 DOI: 10.1007/s12551-019-00607-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 11/07/2019] [Indexed: 02/07/2023] Open
|
50
|
Pal K, Tagliaferri I, Livi CM, Ferrari F. HiCBricks: building blocks for efficient handling of large Hi-C datasets. Bioinformatics 2019; 36:btz808. [PMID: 31697323 PMCID: PMC7703765 DOI: 10.1093/bioinformatics/btz808] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 09/27/2019] [Accepted: 10/24/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Genome-wide chromosome conformation capture based on high-throughput sequencing (Hi-C) has been widely adopted to study chromatin architecture by generating datasets of ever-increasing complexity and size. HiCBricks offers user-friendly and efficient solutions for handling large high-resolution Hi-C datasets. The package provides an R/Bioconductor framework with the bricks to build more complex data analysis pipelines and algorithms. HiCBricks already incorporates functions for calling domain boundaries and functions for high quality data visualization. AVAILABILITY http://bioconductor.org/packages/devel/bioc/html/HiCBricks.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Koustav Pal
- IFOM, The FIRC Institute of Molecular Oncology, Milan, Italy
| | | | | | - Francesco Ferrari
- IFOM, The FIRC Institute of Molecular Oncology, Milan, Italy
- Institute of Molecular Genetics, National Research Council, Pavia, Italy
| |
Collapse
|