1
|
dos Santos A, Campagnari F, Krepischi ACV, Ribeiro Câmara MDL, de Arruda Brasil RDCE, Vieira L, Vianna-Morgante AM, Otto PA, Pearson PL, Rosenberg C. Insight into the mechanisms and consequences of recurrent telomere capture associated with a sub-telomeric deletion. Chromosome Res 2018; 26:191-198. [DOI: 10.1007/s10577-018-9578-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 04/18/2018] [Accepted: 04/19/2018] [Indexed: 11/28/2022]
|
2
|
Bernardi G. Genome Organization and Chromosome Architecture. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2016; 80:83-91. [PMID: 26801160 DOI: 10.1101/sqb.2015.80.027318] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
How the same DNA sequences can function in the three-dimensional architecture of interphase nucleus, fold in the very compact structure of metaphase chromosomes, and go precisely back to the original interphase architecture in the following cell cycle remains an unresolved question to this day. The solution to this question presented here rests on the correlations that were found to hold between the isochore organization of the genome and the architecture of chromosomes from interphase to metaphase. The key points are the following: (1) The transition from the looped domains and subdomains of interphase chromatin to the 30-nm fiber loops of early prophase chromosomes goes through their unfolding into an extended chromatin structure (probably a 10-nm "beads-on-a-string" structure); (2) the architectural proteins of interphase chromatin, such as CTCF and cohesin subunits, are retained in mitosis and are part of the discontinuous protein scaffold of mitotic chromosomes; and (3) the conservation of the link between architectural proteins and their binding sites on DNA through the cell cycle explains the reversibility of the interphase to mitosis process and the "mitotic memory" of interphase architecture.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, 00146 Rome, Italy Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| |
Collapse
|
3
|
Abstract
How the same DNA sequences can function in the three-dimensional architecture of interphase nucleus, fold in the very compact structure of metaphase chromosomes and go precisely back to the original interphase architecture in the following cell cycle remains an unresolved question to this day. The strategy used to address this issue was to analyze the correlations between chromosome architecture and the compositional patterns of DNA sequences spanning a size range from a few hundreds to a few thousands Kilobases. This is a critical range that encompasses isochores, interphase chromatin domains and boundaries, and chromosomal bands. The solution rests on the following key points: 1) the transition from the looped domains and sub-domains of interphase chromatin to the 30-nm fiber loops of early prophase chromosomes goes through the unfolding into an extended chromatin structure (probably a 10-nm "beads-on-a-string" structure); 2) the architectural proteins of interphase chromatin, such as CTCF and cohesin sub-units, are retained in mitosis and are part of the discontinuous protein scaffold of mitotic chromosomes; 3) the conservation of the link between architectural proteins and their binding sites on DNA through the cell cycle explains the "mitotic memory" of interphase architecture and the reversibility of the interphase to mitosis process. The results presented here also lead to a general conclusion which concerns the existence of correlations between the isochore organization of the genome and the architecture of chromosomes from interphase to metaphase.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, Marconi, Rome, Italy
- Stazione Zoologica Anton Dohrn, Villa Comunale, Naples, Italy
| |
Collapse
|
4
|
Cozzi P, Milanesi L, Bernardi G. Segmenting the Human Genome into Isochores. Evol Bioinform Online 2015; 11:253-61. [PMID: 26640363 PMCID: PMC4662427 DOI: 10.4137/ebo.s27693] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 08/25/2015] [Accepted: 08/31/2015] [Indexed: 02/06/2023] Open
Abstract
The human genome is a mosaic of isochores, which are long (>200 kb) DNA sequences that are fairly homogeneous in base composition and can be assigned to five families comprising 33%–59% of GC composition. Although the compartmentalized organization of the mammalian genome has been investigated for more than 40 years, no satisfactory automatic procedure for segmenting the genome into isochores is available so far. We present a critical discussion of the currently available methods and a new approach called isoSegmenter which allows segmenting the genome into isochores in a fast and completely automatic manner. This approach relies on two types of experimentally defined parameters, the compositional boundaries of isochore families and an optimal window size of 100 kb. The approach represents an improvement over the existing methods, is ideally suited for investigating long-range features of sequenced and assembled genomes, and is publicly available at https://github.com/bunop/isoSegmenter.
Collapse
Affiliation(s)
- Paolo Cozzi
- National Research Council, Institute for Biomedical Technologies, Segrate, Milan, Italy. ; Parco Tecnologico Padano, Lodi, Italy
| | - Luciano Milanesi
- National Research Council, Institute for Biomedical Technologies, Segrate, Milan, Italy
| | - Giorgio Bernardi
- National Research Council, Institute for Biomedical Technologies, Segrate, Milan, Italy. ; Science Department, Rome 3 University, Rome, Italy
| |
Collapse
|
5
|
Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GT. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 2014; 9:e90581. [PMID: 24599324 PMCID: PMC3944147 DOI: 10.1371/journal.pone.0090581] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 01/31/2014] [Indexed: 12/21/2022] Open
Abstract
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
Collapse
Affiliation(s)
- Wan-Ping Lee
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Michael P. Stromberg
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Alistair Ward
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Chip Stewart
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Erik P. Garrison
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Gabor T. Marth
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| |
Collapse
|
6
|
Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GT. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 2014; 9:e90581. [PMID: 24599324 DOI: 10.1371/journal.pone.009058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 01/31/2014] [Indexed: 05/27/2023] Open
Abstract
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
Collapse
Affiliation(s)
- Wan-Ping Lee
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Michael P Stromberg
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Alistair Ward
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Chip Stewart
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America; Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Erik P Garrison
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Gabor T Marth
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| |
Collapse
|
7
|
Abstract
The genomes of eukaryotes are mosaics of isochores. These are long DNA stretches that are fairly homogeneous in base composition and that belong to a small number of families characterized by different ratios of GC to AT and different short-sequence patterns (i.e., different DNA structures that interact with different proteins). This genome organization led to two discoveries: (1) the genomic code, which refers to two correlations, that of the composition of coding and contiguous noncoding sequences, and that of coding sequences and the structural properties of the encoded proteins; and (2) the genome phenotypes, which correspond to the patterns of isochore families in the genomes. These patterns indicate that genome evolution may proceed either according to a conservative mode or to a transitional (isochore shifting) mode, apparently depending upon whether the environment is constant or shifting. According to the neoselectionist theory, natural selection is responsible for both modes.
Collapse
|
8
|
Arhondakis S, Auletta F, Bernardi G. Isochores and the regulation of gene expression in the human genome. Genome Biol Evol 2012; 3:1080-9. [PMID: 21979159 PMCID: PMC3227402 DOI: 10.1093/gbe/evr017] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
It is well established that changes in the phenotype depend much more on changes in gene expression than on changes in protein-coding genes, and that cis-regulatory sequences and chromatin structure are two major factors influencing gene expression. Here, we investigated these factors at the genome-wide level by focusing on the trinucleotide patterns in the 0.1- to 25-kb regions flanking the human genes that are present in the GC-poorest L1 and GC-richest H3 isochore families, the other families exhibiting intermediate patterns. We could show 1) that the trinucleotide patterns of the 25-kb gene-flanking regions are representative of the very different patterns already reported for the whole isochores from the L1 and H3 families and, expectedly, identical in upstream and downstream locations; 2) that the patterns of the 0.1- to 0.5-kb regions in the L1 and H3 isochores are remarkably more divergent and more specific when compared with those of the 25-kb regions, as well as different in the upstream and downstream locations; and 3) that these patterns fade into the 25-kb patterns around 5kb in both upstream and downstream locations. The 25-kb findings indicate differences in nucleosome positioning and density in different isochore families, those of the 0.1- to 0.5-kb sequences indicate differences in the transcription factors that bind upstream and downstream of genes. These results indicate differences in the regulation of genes located in different isochore families, a point of functional and evolutionary relevance.
Collapse
Affiliation(s)
- Stilianos Arhondakis
- Bioinformatics and Medical Informatics Team, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | | | | |
Collapse
|
9
|
Abstract
Alu elements are primate-specific repeats and comprise 11% of the human genome. They have wide-ranging influences on gene expression. Their contribution to genome evolution, gene regulation and disease is reviewed.
Collapse
|
10
|
Costantini M, Auletta F, Bernardi G. The distributions of "new" and "old" Alu sequences in the human genome: the solution of a "mystery". Mol Biol Evol 2011; 29:421-7. [PMID: 22057813 DOI: 10.1093/molbev/msr242] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The distribution in the human genome of the largest family of mobile elements, the Alu sequences, has been investigated for the past 30 years, and the vast majority of Alu sequences were shown to have the highest density in GC-rich isochores. Ten years ago, it was discovered, however, that the small "youngest" (most recently transposed) Alu families had a strikingly different distribution compared with the "old" families. This raised the question as to how this change took place in evolution. We solved what was considered to be a "mystery" by 1) revisiting our previous results on the integration and stability of retroviral sequences, and 2) assessing the densities of acceptor sites TTTT/AA in isochore families. We could conclude 1) that the open state of chromatin structure plays a crucial role in allowing not only the initial integration of retroviral sequences but also that of the youngest Alu sequences, and 2) that the distribution of old Alus can be explained as due to Alu sequences being unstable in the GC-poor isochores but stable in the compositionally matching GC-rich isochores, again in line with what happens in the case of retroviral sequences.
Collapse
Affiliation(s)
- Maria Costantini
- Laboratory of Cellular and Developmental Biology, Stazione Zoologica Anton Dohrn, Naples, Italy
| | | | | |
Collapse
|
11
|
Ananda G, Chiaromonte F, Makova KD. A genome-wide view of mutation rate co-variation using multivariate analyses. Genome Biol 2011; 12:R27. [PMID: 21426544 PMCID: PMC3129677 DOI: 10.1186/gb-2011-12-3-r27] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Revised: 02/21/2011] [Accepted: 03/22/2011] [Indexed: 01/03/2023] Open
Abstract
Background While the abundance of available sequenced genomes has led to many studies of regional heterogeneity in mutation rates, the co-variation among rates of different mutation types remains largely unexplored, hindering a deeper understanding of mutagenesis and genome dynamics. Here, utilizing primate and rodent genomic alignments, we apply two multivariate analysis techniques (principal components and canonical correlations) to investigate the structure of rate co-variation for four mutation types and simultaneously explore the associations with multiple genomic features at different genomic scales and phylogenetic distances. Results We observe a consistent, largely linear co-variation among rates of nucleotide substitutions, small insertions and small deletions, with some non-linear associations detected among these rates on chromosome X and near autosomal telomeres. This co-variation appears to be shaped by a common set of genomic features, some previously investigated and some novel to this study (nuclear lamina binding sites, methylated non-CpG sites and nucleosome-free regions). Strong non-linear relationships are also detected among genomic features near the centromeres of large chromosomes. Microsatellite mutability co-varies with other mutation rates at finer scales, but not at 1 Mb, and shows varying degrees of association with genomic features at different scales. Conclusions Our results allow us to speculate about the role of different molecular mechanisms, such as replication, recombination, repair and local chromatin environment, in mutagenesis. The software tools developed for our analyses are available through Galaxy, an open-source genomics portal, to facilitate the use of multivariate techniques in future large-scale genomics studies.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Center for Medical Genomics, Penn State University, University Park, PA 16802, USA
| | | | | |
Collapse
|
12
|
Abstract
Gene duplications represent an important class of evolutionary events that is likely to have contributed to the unique human phenotype in the short evolutionary time since the human-chimpanzee divergence. With the availability of both human and chimpanzee genome drafts in high coverage re-sequencing assemblies and the high annotation quality of most human genes, it should now be possible to identify all human lineage-specific gene duplication events (human inparalogues) and a few pioneering studies have attempted to do that. However, the different levels of coverage in the human and chimpanzee's genomes assemblies, and the differing levels of gene annotation, have led to problematic assumptions and oversimplifications in the algorithms and the datasets used to detect human lineage-specific gene duplications. In this study, we have developed a set of bioinformatic tools to overcome a number of the conceptual problems that are prevalent in previous studies and have collected a reliable and representative set of human inparalogues.
Collapse
Affiliation(s)
- Yuval Itan
- Research Department of Genetics, Evolution and Environment, University College London, UK.
| | | | | |
Collapse
|
13
|
Goldmann R, Tichý L, Freiberger T, Zapletalová P, Letocha O, Soska V, Fajkus J, Fajkusová L. Genomic characterization of large rearrangements of the LDLR gene in Czech patients with familial hypercholesterolemia. BMC MEDICAL GENETICS 2010; 11:115. [PMID: 20663204 PMCID: PMC2923121 DOI: 10.1186/1471-2350-11-115] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 07/27/2010] [Indexed: 02/02/2023]
Abstract
Background Mutations in the LDLR gene are the most frequent cause of Familial hypercholesterolemia, an autosomal dominant disease characterised by elevated concentrations of LDL in blood plasma. In many populations, large genomic rearrangements account for approximately 10% of mutations in the LDLR gene. Methods DNA diagnostics of large genomic rearrangements was based on Multiple Ligation dependent Probe Amplification (MLPA). Subsequent analyses of deletion and duplication breakpoints were performed using long-range PCR, PCR, and DNA sequencing. Results In set of 1441 unrelated FH patients, large genomic rearrangements were found in 37 probands. Eight different types of rearrangements were detected, from them 6 types were novel, not described so far. In all rearrangements, we characterized their exact extent and breakpoint sequences. Conclusions Sequence analysis of deletion and duplication breakpoints indicates that intrachromatid non-allelic homologous recombination (NAHR) between Alu elements is involved in 6 events, while a non-homologous end joining (NHEJ) is implicated in 2 rearrangements. Our study thus describes for the first time NHEJ as a mechanism involved in genomic rearrangements in the LDLR gene.
Collapse
Affiliation(s)
- Radan Goldmann
- University Hospital Brno, Centre of Molecular Biology and Gene Therapy, Cernopolní 9, CZ-62500 Brno, Czech Republic
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Abstract
Uncovering general principles of genome evolution that are time-invariant and that operate in germ and somatic cells has implications for genome-wide association studies (GWAS), gene therapy, and disease genomics. Here we investigate the relationship between structural alterations (e.g., insertions and deletions) and single-nucleotide substitutions by comparing the following genomes that diverged at different times across germ- and somatic-cell lineages: (i) the reference human and chimpanzee genome (in million years), (ii) the reference human and personal genomes (in tens of thousands of years), and (iii) structurally altered regions in cancer and genetically engineered cells (in days). At the species level, genes with structural alteration in nearby regions show increased single-nucleotide changes and tend to evolve faster. In personal genomes, the single-nucleotide substitution rate is higher near sites of structural alteration and decreases with increasing distance. In human cancer cell populations and in cells genetically engineered using zinc-finger nucleases, single-nucleotide changes occur frequently near sites of structural alterations. We present evidence that structural alteration induces single-nucleotide changes in nearby regions and discuss possible molecular mechanisms that contribute to this phenomenon. We propose that the low fidelity of nonreplicative error-prone repair polymerases, which are used during insertion or deletion, result in break-repair-induced single-nucleotide mutations in the vicinity of structural alteration. Thus, in the mutational landscape, structural alterations are linked to single-nucleotide changes across different time scales in both somatic- and germ-cell lineages. We discuss implications for genome evolution, GWAS, disease genomics, and gene therapy and emphasize the need to investigate both types of mutations within a single framework.
Collapse
|
15
|
Li Y, Pohl E, Boulouiz R, Schraders M, Nürnberg G, Charif M, Admiraal RJ, von Ameln S, Baessmann I, Kandil M, Veltman JA, Nürnberg P, Kubisch C, Barakat A, Kremer H, Wollnik B. Mutations in TPRN cause a progressive form of autosomal-recessive nonsyndromic hearing loss. Am J Hum Genet 2010; 86:479-84. [PMID: 20170898 DOI: 10.1016/j.ajhg.2010.02.003] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Revised: 02/02/2010] [Accepted: 02/04/2010] [Indexed: 10/19/2022] Open
Abstract
We performed genome-wide homozygosity mapping in a large consanguineous family from Morocco and mapped the autosomal-recessive nonsyndromic hearing loss (ARNSHL) in this family to the DFNB79 locus on chromosome 9q34. By sequencing of 62 positional candidate genes of the critical region, we identified a causative homozygous 11 bp deletion, c.42_52del, in the TPRN gene in all seven affected individuals. The deletion is located in exon 1 and results in a frameshift and premature protein truncation (p.Gly15AlafsX150). Interestingly, the deleted sequence is part of a repetitive and CG-rich motive predicted to be prone to structural aberrations during crossover formation. We identified another family with progressive ARNSHL linked to this locus, whose affected members were shown to carry a causative 1 bp deletion (c.1347delG) in exon 1 of TPRN. The function of the encoded protein, taperin, is unknown; yet, partial homology to the actin-caping protein phostensin suggests a role in actin dynamics.
Collapse
|