1
|
Bergman S, Tuller T. Strong association between genomic 3D structure and CRISPR cleavage efficiency. PLoS Comput Biol 2024; 20:e1012214. [PMID: 38848440 PMCID: PMC11189236 DOI: 10.1371/journal.pcbi.1012214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/20/2024] [Accepted: 05/30/2024] [Indexed: 06/09/2024] Open
Abstract
CRISPR is a gene editing technology which enables precise in-vivo genome editing; but its potential is hampered by its relatively low specificity and sensitivity. Improving CRISPR's on-target and off-target effects requires a better understanding of its mechanism and determinants. Here we demonstrate, for the first time, the chromosomal 3D spatial structure's association with CRISPR's cleavage efficiency, and its predictive capabilities. We used high-resolution Hi-C data to estimate the 3D distance between different regions in the human genome and utilized these spatial properties to generate 3D-based features, characterizing each region's density. We evaluated these features based on empirical, in-vivo CRISPR efficiency data and compared them to 425 features used in state-of-the-art models. The 3D features ranked in the top 13% of the features, and significantly improved the predictive power of LASSO and xgboost models trained with these features. The features indicated that sites with lower spatial density demonstrated higher efficiency. Understanding how CRISPR is affected by the 3D DNA structure provides insight into CRISPR's mechanism in general and improves our ability to correctly predict CRISPR's cleavage as well as design sgRNAs for therapeutic and scientific use.
Collapse
Affiliation(s)
- Shaked Bergman
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv, Israel
- The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel
| |
Collapse
|
2
|
Tolokh IS, Kinney NA, Sharakhov IV, Onufriev AV. Strong interactions between highly dynamic lamina-associated domains and the nuclear envelope stabilize the 3D architecture of Drosophila interphase chromatin. Epigenetics Chromatin 2023; 16:21. [PMID: 37254161 PMCID: PMC10228000 DOI: 10.1186/s13072-023-00492-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/04/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Interactions among topologically associating domains (TADs), and between the nuclear envelope (NE) and lamina-associated domains (LADs) are expected to shape various aspects of three-dimensional (3D) chromatin structure and dynamics; however, relevant genome-wide experiments that may provide statistically significant conclusions remain difficult. RESULTS We have developed a coarse-grained dynamical model of D. melanogaster nuclei at TAD resolution that explicitly accounts for four distinct epigenetic classes of TADs and LAD-NE interactions. The model is parameterized to reproduce the experimental Hi-C map of the wild type (WT) nuclei; it describes time evolution of the chromatin over the G1 phase of the interphase. The simulations include an ensemble of nuclei, corresponding to the experimentally observed set of several possible mutual arrangements of chromosomal arms. The model is validated against multiple structural features of chromatin from several different experiments not used in model development. Predicted positioning of all LADs at the NE is highly dynamic-the same LAD can attach, detach and move far away from the NE multiple times during interphase. The probabilities of LADs to be in contact with the NE vary by an order of magnitude, despite all having the same affinity to the NE in the model. These probabilities are mostly determined by a highly variable local linear density of LADs along the genome, which also has the same strong effect on the predicted positioning of individual TADs -- higher probability of a TAD to be near NE is largely determined by a higher linear density of LADs surrounding this TAD. The distribution of LADs along the chromosome chains plays a notable role in maintaining a non-random average global structure of chromatin. Relatively high affinity of LADs to the NE in the WT nuclei substantially reduces sensitivity of the global radial chromatin distribution to variations in the strength of TAD-TAD interactions compared to the lamin depleted nuclei, where a small (0.5 kT) increase of cross-type TAD-TAD interactions doubles the chromatin density in the central nucleus region. CONCLUSIONS A dynamical model of the entire fruit fly genome makes multiple genome-wide predictions of biological interest. The distribution of LADs along the chromatin chains affects their probabilities to be in contact with the NE and radial positioning of highly mobile TADs, playing a notable role in creating a non-random average global structure of the chromatin. We conjecture that an important role of attractive LAD-NE interactions is to stabilize global chromatin structure against inevitable cell-to-cell variations in TAD-TAD interactions.
Collapse
Affiliation(s)
- Igor S. Tolokh
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 USA
| | - Nicholas Allen Kinney
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 USA
- Department of Entomology, Virginia Tech, Blacksburg, VA 24061 USA
- Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060 USA
| | | | - Alexey V. Onufriev
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 USA
- Department of Physics, Virginia Tech, Blacksburg, VA 24061 USA
- Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, VA 24061 USA
| |
Collapse
|
3
|
Lohia R, Fox N, Gillis J. A global high-density chromatin interaction network reveals functional long-range and trans-chromosomal relationships. Genome Biol 2022; 23:238. [PMID: 36352464 PMCID: PMC9647974 DOI: 10.1186/s13059-022-02790-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 10/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chromatin contacts are essential for gene-expression regulation; however, obtaining a high-resolution genome-wide chromatin contact map is still prohibitively expensive owing to large genome sizes and the quadratic scale of pairwise data. Chromosome conformation capture (3C)-based methods such as Hi-C have been extensively used to obtain chromatin contacts. However, since the sparsity of these maps increases with an increase in genomic distance between contacts, long-range or trans-chromatin contacts are especially challenging to sample. RESULTS Here, we create a high-density reference genome-wide chromatin contact map using a meta-analytic approach. We integrate 3600 human, 6700 mouse, and 500 fly Hi-C experiments to create species-specific meta-Hi-C chromatin contact maps with 304 billion, 193 billion, and 19 billion contacts in respective species. We validate that meta-Hi-C contact maps are uniquely powered to capture functional chromatin contacts in both cis and trans. We find that while individual dataset Hi-C networks are largely unable to predict any long-range coexpression (median 0.54 AUC), meta-Hi-C networks perform comparably in both cis and trans (0.65 AUC vs 0.64 AUC). Similarly, for long-range expression quantitative trait loci (eQTL), meta-Hi-C contacts outperform all individual Hi-C experiments, providing an improvement over the conventionally used linear genomic distance-based association. Assessing between species, we find patterns of chromatin contact conservation in both cis and trans and strong associations with coexpression even in species for which Hi-C data is lacking. CONCLUSIONS We have generated an integrated chromatin interaction network which complements a large number of methodological and analytic approaches focused on improved specificity or interpretation. This high-depth "super-experiment" is surprisingly powerful in capturing long-range functional relationships of chromatin interactions, which are now able to predict coexpression, eQTLs, and cross-species relationships. The meta-Hi-C networks are available at https://labshare.cshl.edu/shares/gillislab/resource/HiC/ .
Collapse
Affiliation(s)
- Ruchi Lohia
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, USA
| | - Nathan Fox
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, USA
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, USA
- Department of Physiology and Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| |
Collapse
|
4
|
Gutman T, Goren G, Efroni O, Tuller T. Estimating the predictive power of silent mutations on cancer classification and prognosis. NPJ Genom Med 2021; 6:67. [PMID: 34385450 PMCID: PMC8361094 DOI: 10.1038/s41525-021-00229-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 06/24/2021] [Indexed: 02/07/2023] Open
Abstract
In recent years it has been shown that silent mutations, in and out of the coding region, can affect gene expression and may be related to tumorigenesis and cancer cell fitness. However, the predictive ability of these mutations for cancer type diagnosis and prognosis has not been evaluated yet. In the current study, based on the analysis of 9,915 cancer genomes and approximately three million mutations, we provide a comprehensive quantitative evaluation of the predictive power of various types of silent and non-silent mutations over cancer classification and prognosis. The results indicate that silent-mutation models outperform the equivalent null models in classifying all examined cancer types and in estimating the probability of survival 10 years after the initial diagnosis. Additionally, combining both non-silent and silent mutations achieved the best classification results for 68% of the cancer types and the best survival estimation results for up to nine years after the diagnosis. Thus, silent mutations hold considerable predictive power over both cancer classification and prognosis, most likely due to their effect on gene expression. It is highly advised that silent mutations are integrated in cancer research in order to unravel the full genomic landscape of cancer and its ramifications on cancer fitness.
Collapse
Affiliation(s)
- Tal Gutman
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Guy Goren
- Department of Electrical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Omri Efroni
- Department of Electrical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel.
| |
Collapse
|
5
|
Gong H, Yang Y, Zhang S, Li M, Zhang X. Application of Hi-C and other omics data analysis in human cancer and cell differentiation research. Comput Struct Biotechnol J 2021; 19:2070-2083. [PMID: 33995903 PMCID: PMC8086027 DOI: 10.1016/j.csbj.2021.04.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/04/2021] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.
Collapse
Affiliation(s)
- Haiyan Gong
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| | - Yi Yang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Sichen Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaotong Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| |
Collapse
|
6
|
Abstract
Messenger RNAs (mRNAs) consist of a coding region (open reading frame (ORF)) and two untranslated regions (UTRs), 5'UTR and 3'UTR. Ribosomes travel along the coding region, translating nucleotide triplets (called codons) to a chain of amino acids. The coding region was long believed to mainly encode the amino acid content of proteins, whereas regulatory signals reside in the UTRs and in other genomic regions. However, in recent years we have learned that the ORF is expansively populated with various regulatory signals, or codes, which are related to all gene expression steps and additional intracellular aspects. In this paper, we review the current knowledge related to overlapping codes inside the coding regions, such as the influence of synonymous codon usage on translation speed (and, in turn, the effect of translation speed on protein folding), ribosomal frameshifting, mRNA stability, methylation, splicing, transcription and more. All these codes come together and overlap in the ORF sequence, ensuring production of the right protein at the right time.
Collapse
Affiliation(s)
- Shaked Bergman
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
7
|
Structural variations in a non-coding region at 1q32.1 are responsible for the NYS7 locus in two large families. Hum Genet 2020; 139:1057-1064. [PMID: 32248360 PMCID: PMC7406531 DOI: 10.1007/s00439-020-02156-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 03/24/2020] [Indexed: 01/20/2023]
Abstract
Congenital motor nystagmus (CMN) is characterized by early-onset bilateral ocular oscillations without other ocular deficits. To date, mutations in only one gene have been identified to be responsible for CMN, i.e., FRMD7 for X-linked CMN. Four loci for autosomal dominant CMN, including NYS7 (OMIM 614826), have been mapped but the causative genes have yet to be identified. NYS7 was mapped to 1q32.1 based on independent genome-wide linkage scan on two large families with CMN. In this study, mutations in all known protein-coding genes, both intronic sequence with predicted effect and coding sequence, in the linkage interval were excluded by whole-genome sequencing. Then, long-read genome sequencing based on the Nanopore platform was performed with a sample from each of the two families. Two deletions with an overlapping region of 775,699 bp, located in a region without any known protein-coding genes, were identified in the two families in the linkage region. The two deletions as well as their breakpoints were confirmed by Sanger sequencing and co-segregated with CMN in the two families. The 775,699 bp deleted region contains uncharacterized non-protein-coding expressed sequences and pseudogenes but no protein-coding genes. However, Hi-C data predicted that the deletions span two topologically associated domains and probably lead to a change in the 3D genomic architecture. These results provide novel evidence of a strong association between structural variations in non-coding genomic regions and human hereditary diseases like CMN with a potential mechanism involving changes in 3D genome architecture, which provides clues regarding the molecular pathogenicity of CMN.
Collapse
|
8
|
|
9
|
Kong S, Zhang Y. Deciphering Hi-C: from 3D genome to function. Cell Biol Toxicol 2019; 35:15-32. [PMID: 30610495 DOI: 10.1007/s10565-018-09456-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 12/02/2018] [Indexed: 12/11/2022]
Abstract
Hi-C is a commonly used technology in 3D genomics which can depict global chromatin interactions across eukaryotic genome. Integrating with different datasets, it can also be applied to studying various biological questions, such as nuclear organization, gene transcription regulation, spatiotemporal development, genome assembly, and cancer genomics. During the last decade, the development and application of Hi-C have dramatically changed the view of genome architecture, chromatin conformation, and gene interaction. So far, Hi-C-related studies remain vivacious and controversial; thus, a unified standard of library construction and bioinformatics analysis are urgently needed. In this review, we have summarized its history, development, methodologies, advances, applications, shortages, and future perspectives. We discuss a few limitations of the current Hi-C technologies and future directions for improvement and highlight how Hi-C can bridge 3D structure to gene function. This review will be helpful for scientists who want to engage in the 3D genomics field; it also shows some future tracks.
Collapse
Affiliation(s)
- Siyuan Kong
- Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 7 Pengfei Road, Dapeng District, 518120, Shenzhen, People's Republic of China
| | - Yubo Zhang
- Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 7 Pengfei Road, Dapeng District, 518120, Shenzhen, People's Republic of China.
| |
Collapse
|