1
|
Jabbari K, Chakraborty M, Wiehe T. DNA sequence-dependent chromatin architecture and nuclear hubs formation. Sci Rep 2019; 9:14646. [PMID: 31601866 PMCID: PMC6787200 DOI: 10.1038/s41598-019-51036-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 09/18/2019] [Indexed: 02/08/2023] Open
Abstract
In this study, by exploring chromatin conformation capture data, we show that the nuclear segregation of Topologically Associated Domains (TADs) is contributed by DNA sequence composition. GC-peaks and valleys of TADs strongly influence interchromosomal interactions and chromatin 3D structure. To gain insight on the compositional and functional constraints associated with chromatin interactions and TADs formation, we analysed intra-TAD and intra-loop GC variations. This led to the identification of clear GC-gradients, along which, the density of genes, super-enhancers, transcriptional activity, and CTCF binding sites occupancy co-vary non-randomly. Further, the analysis of DNA base composition of nucleolar aggregates and nuclear speckles showed strong sequence-dependant effects. We conjecture that dynamic DNA binding affinity and flexibility underlay the emergence of chromatin condensates, their growth is likely promoted in mechanically soft regions (GC-rich) of the lowest chromatin and nucleosome densities. As a practical perspective, the strong linear association between sequence composition and interchromosomal contacts can help define consensus chromatin interactions, which in turn may be used to study alternative states of chromatin architecture.
Collapse
Affiliation(s)
- Kamel Jabbari
- Institute for Genetics, Biocenter Cologne, University of Cologne, Zülpicher Straße 47a, 50674, Köln, Germany.
| | - Maharshi Chakraborty
- Institute for Genetics, Biocenter Cologne, University of Cologne, Zülpicher Straße 47a, 50674, Köln, Germany
| | - Thomas Wiehe
- Institute for Genetics, Biocenter Cologne, University of Cologne, Zülpicher Straße 47a, 50674, Köln, Germany
| |
Collapse
|
2
|
Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: The role of mixing statistics and frame shift of neighboring genes. Comput Biol Chem 2014; 53 Pt A:15-25. [DOI: 10.1016/j.compbiolchem.2014.08.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
3
|
Elhaik E, Graur D. A comparative study and a phylogenetic exploration of the compositional architectures of mammalian nuclear genomes. PLoS Comput Biol 2014; 10:e1003925. [PMID: 25375262 PMCID: PMC4222635 DOI: 10.1371/journal.pcbi.1003925] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Accepted: 09/18/2014] [Indexed: 11/18/2022] Open
Abstract
For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the "isochore theory," which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the "murid shift," and in many ways resembles the genome of opossum. We find no support to the "isochore theory." Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
- * E-mail:
| | - Dan Graur
- Department of Biology & Biochemistry, University of Houston, Houston, Texas, United States of America
| |
Collapse
|
4
|
Algama M, Keith JM. Investigating genomic structure using changept: A Bayesian segmentation model. Comput Struct Biotechnol J 2014; 10:107-15. [PMID: 25349679 PMCID: PMC4204429 DOI: 10.1016/j.csbj.2014.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though they may nevertheless be of interest to biologists. One technique for investigating the composition of genomes is to segment sequences into compositionally homogenous blocks. This technique, known as 'sequence segmentation' or 'change-point analysis', is used to identify patterns of variation across genomes such as GC-rich and GC-poor regions, coding and non-coding regions, slowly evolving and rapidly evolving regions and many other types of variation. In this mini-review we outline many of the genome segmentation methods currently available and then focus on a Bayesian DNA segmentation algorithm, with examples of its various applications.
Collapse
Affiliation(s)
- Manjula Algama
- School of Mathematical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Jonathan M Keith
- School of Mathematical Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
5
|
Elhaik E, Graur D, Josic K. Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol Biol Evol 2009; 27:1015-24. [PMID: 20018981 DOI: 10.1093/molbev/msp307] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology & Biochemistry, University of Houston, TX, USA.
| | | | | |
Collapse
|
6
|
Li W, Holste D. Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:041910. [PMID: 15903704 DOI: 10.1103/physreve.71.041910] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 10/28/2004] [Indexed: 05/02/2023]
Abstract
Spatial fluctuations of guanine and cytosine base content (GC%) are studied by spectral analysis for the complete set of human genomic DNA sequences. We find that (i) 1/ f(alpha) decay is universally observed in the power spectra of all 24 chromosomes, and (ii) the exponent alpha approximately 1 extends to about 10(7) bases, one order of magnitude longer than has previously been observed. We further find that (iii) almost all human chromosomes exhibit a crossover from alpha(1) approximately 1 (1/ f (alpha(1))) at lower frequency to alpha(2) <1 (1/ f (alpha(2))) at higher frequency, typically occurring at around 30,000-100,000 bases, while (iv) the crossover in this frequency range is virtually absent in human chromosome 22. In addition to the universal 1/ f(alpha) noise in power spectra, we find (v) several lines of evidence for chromosome-specific correlation structures, including a 500,000 base long oscillation in human chromosome 21. The universal 1/ f(alpha) spectrum in the human genome is further substantiated by a resistance to reduction in variance of guanine and cytosine content when the window size is increased.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, New York 10030, USA.
| | | |
Collapse
|
7
|
Li W, Holste D. An unusual 500,000 bases long oscillation of guanine and cytosine content in human chromosome 21. Comput Biol Chem 2004; 28:393-9. [PMID: 15556480 DOI: 10.1016/j.compbiolchem.2004.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 01/09/2023]
Abstract
An oscillation with a period of around 500 kb in guanine and cytosine content (GC%) is observed in the DNA sequence of human chromosome 21. This oscillation is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb to 46.5 Mb. Five cycles of oscillation are observed in this region with six GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions with low density of CpG islands and, alternating between the two DNA strands, low gene density regions. Consequently, the long-range oscillation of GC% result in spacing patterns of both CpG island density, and to a lesser extent, gene densities.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA.
| | | |
Collapse
|
8
|
Paces J, Zíka R, Paces V, Pavlícek A, Clay O, Bernardi G. Representing GC variation along eukaryotic chromosomes. Gene 2004; 333:135-41. [PMID: 15177688 DOI: 10.1016/j.gene.2004.02.041] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2003] [Accepted: 02/10/2004] [Indexed: 02/03/2023]
Abstract
Genome sequencing now permits direct visual representation, at any scale, of GC heterogeneity along the chromosomes of several higher eukaryotes. Plots can be easily obtained from the chromosomal sequences, yet sequence releases of mammalian or plant chromosomes still tend to use small scales or window sizes that obscure important large-scale compositional features. To faithfully reveal, at one glance, the compositional variation at a given scale, we have devised a simple scheme that combines line plots with color-coded shading of the regions underneath the plots. The scheme can be applied to different eukaryotic genomes to facilitate their comparison, as illustrated here for a sample of chromosomes chosen from seven selected species. As a complement to a previously published compact view of isochores in the human genome sequence, we include here an analogous map for the recently sequenced mouse genome, and discuss the contribution of repetitive DNA to the GC variation along the plots. Supplementary information, including a database of color-coded GC profiles for all recently sequenced eukaryotes and the program draw_chromosomes_gc.pl used to obtain them, are available at.
Collapse
Affiliation(s)
- Jan Paces
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Flemingovo 2, Prague CZ-16637, Czech Republic
| | | | | | | | | | | |
Collapse
|
9
|
Schuck P. A model for sedimentation in inhomogeneous media. I. Dynamic density gradients from sedimenting co-solutes. Biophys Chem 2004; 108:187-200. [PMID: 15043929 DOI: 10.1016/j.bpc.2003.10.016] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Macromolecular sedimentation in inhomogeneous media is of great practical importance. Dynamic density gradients have a long tradition in analytical ultracentrifugation, and are frequently used in preparative ultracentrifugation. In this paper, a new theoretical model for sedimentation in inhomogeneous media is presented, based on finite element solutions of the Lamm equation with spatial and temporal variation of the local solvent density and viscosity. It is applied to macromolecular sedimentation in the presence of a dynamic density gradient formed by the sedimentation of a co-solute at high concentration. It is implemented in the software SEDFIT for the analysis of experimental macromolecular concentration distributions. The model agrees well with the measured sedimentation profiles of a protein in a dynamic cesium chloride gradient, and may provide a measure for the effects of hydration or preferential solvation parameters. General features of protein sedimentation in dynamic density gradients are described.
Collapse
Affiliation(s)
- Peter Schuck
- Division of Bioengineering and Physical Science, ORS, OD, National Institutes of Health, Building 13, Room 3N17, 13 South Drive, Bethesda, MD 20892-5766, USA.
| |
Collapse
|
10
|
Holste D, Grosse I, Beirer S, Schieg P, Herzel H. Repeats and correlations in human DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:061913. [PMID: 16241267 DOI: 10.1103/physreve.67.061913] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2003] [Indexed: 05/04/2023]
Abstract
We study the nucleotide-nucleotide mutual information function I(k) of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the k=3 base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the k=10-11 bp sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about k=135 bp and at about k=165 bp. We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by I(k) on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.
Collapse
Affiliation(s)
- Dirk Holste
- Department of Biology, Massachusetts Institute of Technology, Cambridge 02139, USA.
| | | | | | | | | |
Collapse
|
11
|
Abstract
The isochore concept in the human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in the IHGSC's analysis concerning the existence of isochores is misleading, because the homogeneity was not examined at a large enough length scale and consequently an inappropriate statistical test was applied. A test of the existence of isochores should be equivalent to a test of homogeneity or equality of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is a test of whether individual bases are independent and identically-distributed (iid). For testing the existence of isochores, or homogeneity in windowed GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by the binomial test may not be rejected by the ANOVA test.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore LIJ Research Institute, 350 Community Drive, Manhasset, NY 11030, USA.
| | | | | | | |
Collapse
|
12
|
Abstract
We present a coding measure which is based on the statistical properties of the stop codons, and that is able to estimate accurately the variation of coding content along an anonymous sequence. As the stop codons play the same role in all the genomes (with very few exceptions) the measure turns out to be species-independent. We show results both for prokaryotic and for eukaryotic genomes, indicating, first, the accuracy of the measure, and, second, that better prediction is achieved if the measure is applied on homogeneous, isochore-like sequences than if it is applied following the standard moving window approach. Finally, we discuss on some of the possible applications of the measure.
Collapse
Affiliation(s)
- P Carpena
- Departamento de Física Aplicada II, E.T.S.I. de Telecomunicación, Universidad de Málaga, Malaga, Spain.
| | | | | | | |
Collapse
|
13
|
Abstract
Three statistical/mathematical analyses are carried out on isochore sequences: spectral analysis, analysis of variance, and segmentation analysis. Spectral analysis shows that there are GC content fluctuations at different length scales in isochore sequences. The analysis of variance shows that the null hypothesis (the mean value of a group of GC contents remains the same along the sequence) may or may not be rejected for an isochore sequence, depending on the subwindow sizes at which GC contents are sampled, and the window size within which group members are defined. The segmentation analysis shows that there are stronger indications of GC content changes at isochore borders than within an isochore. These analyses support the notion of isochore sequences, but reject the assumption that isochore sequences are homogeneous at the base level. An isochore sequence may pass a homogeneity test when GC content fluctuations at smaller length scales are ignored or averaged out.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore - LIJ Research Institute, 350 Community Drive, Manhasset, NY 10030, USA.
| |
Collapse
|
14
|
Li W, Bernaola-Galván P, Haghighi F, Grosse I. Applications of recursive segmentation to the analysis of DNA sequences. COMPUTERS & CHEMISTRY 2002; 26:491-510. [PMID: 12144178 DOI: 10.1016/s0097-8485(02)00010-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore-LIJ Research Institute, Manhasset, NY 11030, USA.
| | | | | | | |
Collapse
|
15
|
Abstract
In a DNA sequence that exhibits long-range correlations, standard deviations among the GC levels of its segments can be up to an order of magnitude higher than in a sequence consisting of independent, identically distributed nucleotides. Conversely, plots of inter-segment standard deviations vs. segment length reveal quantitative information about the correlations present in a sequence. We present and discuss formulae that relate long-range (power-law) correlations between the nucleotides of a sequence to the expected standard deviations of the GC levels of its segments, and to the correlations between them.
Collapse
Affiliation(s)
- O Clay
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|
16
|
Clay O, Bernardi G. Compositional heterogeneity within and among isochores in mammalian genomes. II. Some general comments. Gene 2001; 276:25-31. [PMID: 11591468 DOI: 10.1016/s0378-1119(01)00668-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The presence of long-range correlations and/or mosaicism in DNA sequences results in GC fluctuations, even within individual isochores that are much larger than expected correlation-free 'random' sequences. Neglecting the presence of such fluctuations can lead to incorrect conclusions regarding relative homogeneity or isochore borders. In this commentary, we address these and other methodological issues raised by the variations in GC level within human isochores. We also discuss some recent misconceptions.
Collapse
Affiliation(s)
- O Clay
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | |
Collapse
|
17
|
Abstract
A few months ago the International Human Genome Sequencing Consortium (IHGSC) published a 61-page paper on the human genome (IHGSC, Nature 409 (2001) 860). Here comments will be presented on some points of the paper that were previously investigated in our laboratory, and some misunderstandings and misconceptions about the organization and the evolutionary history of the human genome will be discussed. A very recent article on the same subject (Eyre-Walker and Hurst, Nat. Rev. Genet. 2 (2001) 549) will also be addressed. The present paper is a complement to two review articles which were published last year (Bernardi, Gene 241 (2000) 3; Gene 259(1) (2000) 31).
Collapse
Affiliation(s)
- G Bernardi
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|