1
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. Nucleic Acids Res 2025; 53:gkaf298. [PMID: 40226919 PMCID: PMC11995269 DOI: 10.1093/nar/gkaf298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 02/28/2025] [Accepted: 04/07/2025] [Indexed: 04/15/2025] Open
Abstract
Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, United States
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, United States
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, United States
- Center for Medical Genomics, Penn State University, University Park, PA 16802, United States
- L’EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, United States
- Center for Medical Genomics, Penn State University, University Park, PA 16802, United States
| |
Collapse
|
2
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.09.02.610891. [PMID: 39713403 PMCID: PMC11661062 DOI: 10.1101/2024.09.02.610891] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Non-canonical (non-B) DNA structures-e.g., bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g., A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies, and occupy 9-15%, 9-11%, and 12-38% of autosomes, and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
- L'EMbeDS, Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
| |
Collapse
|
3
|
Abstract
Repetitive elements in the human genome, once considered 'junk DNA', are now known to adopt more than a dozen alternative (that is, non-B) DNA structures, such as self-annealed hairpins, left-handed Z-DNA, three-stranded triplexes (H-DNA) or four-stranded guanine quadruplex structures (G4 DNA). These dynamic conformations can act as functional genomic elements involved in DNA replication and transcription, chromatin organization and genome stability. In addition, recent studies have revealed a role for these alternative structures in triggering error-generating DNA repair processes, thereby actively enabling genome plasticity. As a driving force for genetic variation, non-B DNA structures thus contribute to both disease aetiology and evolution.
Collapse
Affiliation(s)
- Guliang Wang
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA
| | - Karen M Vasquez
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA.
| |
Collapse
|
4
|
Makova KD, Weissensteiner MH. Noncanonical DNA structures are drivers of genome evolution. Trends Genet 2023; 39:109-124. [PMID: 36604282 PMCID: PMC9877202 DOI: 10.1016/j.tig.2022.11.005] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 11/04/2022] [Accepted: 11/28/2022] [Indexed: 01/05/2023]
Abstract
In addition to the canonical right-handed double helix, other DNA structures, termed 'non-B DNA', can form in the genomes across the tree of life. Non-B DNA regulates multiple cellular processes, including replication and transcription, yet its presence is associated with elevated mutagenicity and genome instability. These discordant cellular roles fuel the enormous potential of non-B DNA to drive genomic and phenotypic evolution. Here we discuss recent studies establishing non-B DNA structures as novel functional elements subject to natural selection, affecting evolution of transposable elements (TEs), and specifying centromeres. By highlighting the contributions of non-B DNA to repeated evolution and adaptation to changing environments, we conclude that evolutionary analyses should include a perspective of not only DNA sequence, but also its structure.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Penn State University, 310 Wartik Laboratory, University Park, PA 16802, USA.
| | | |
Collapse
|
5
|
Bowater RP, Bohálová N, Brázda V. Interaction of Proteins with Inverted Repeats and Cruciform Structures in Nucleic Acids. Int J Mol Sci 2022; 23:ijms23116171. [PMID: 35682854 PMCID: PMC9180970 DOI: 10.3390/ijms23116171] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 05/26/2022] [Accepted: 05/30/2022] [Indexed: 01/27/2023] Open
Abstract
Cruciforms occur when inverted repeat sequences in double-stranded DNA adopt intra-strand hairpins on opposing strands. Biophysical and molecular studies of these structures confirm their characterization as four-way junctions and have demonstrated that several factors influence their stability, including overall chromatin structure and DNA supercoiling. Here, we review our understanding of processes that influence the formation and stability of cruciforms in genomes, covering the range of sequences shown to have biological significance. It is challenging to accurately sequence repetitive DNA sequences, but recent advances in sequencing methods have deepened understanding about the amounts of inverted repeats in genomes from all forms of life. We highlight that, in the majority of genomes, inverted repeats are present in higher numbers than is expected from a random occurrence. It is, therefore, becoming clear that inverted repeats play important roles in regulating many aspects of DNA metabolism, including replication, gene expression, and recombination. Cruciforms are targets for many architectural and regulatory proteins, including topoisomerases, p53, Rif1, and others. Notably, some of these proteins can induce the formation of cruciform structures when they bind to DNA. Inverted repeat sequences also influence the evolution of genomes, and growing evidence highlights their significance in several human diseases, suggesting that the inverted repeat sequences and/or DNA cruciforms could be useful therapeutic targets in some cases.
Collapse
Affiliation(s)
- Richard P. Bowater
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK;
| | - Natália Bohálová
- Department of Biophysical Chemistry and Molecular Oncology, Institute of Biophysics of the Czech Academy of Sciences, 61265 Brno, Czech Republic;
- Department of Experimental Biology, Faculty of Science, Masaryk University, Kamenice 5, 62500 Brno, Czech Republic
| | - Václav Brázda
- Department of Biophysical Chemistry and Molecular Oncology, Institute of Biophysics of the Czech Academy of Sciences, 61265 Brno, Czech Republic;
- Correspondence:
| |
Collapse
|
6
|
Cruciform Formable Sequences within Pou5f1 Enhancer Are Indispensable for Mouse ES Cell Integrity. Int J Mol Sci 2021; 22:ijms22073399. [PMID: 33810223 PMCID: PMC8036336 DOI: 10.3390/ijms22073399] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/22/2021] [Accepted: 03/22/2021] [Indexed: 01/04/2023] Open
Abstract
DNA can adopt various structures besides the B-form. Among them, cruciform structures are formed on inverted repeat (IR) sequences. While cruciform formable IRs (CFIRs) are sometimes found in regulatory regions of transcription, their function in transcription remains elusive, especially in eukaryotes. We found a cluster of CFIRs within the mouse Pou5f1 enhancer. Here, we demonstrate that this cluster or some member(s) plays an active role in the transcriptional regulation of not only Pou5f1, but also Sox2, Nanog, Klf4 and Esrrb. To clarify in vivo function of the cluster, we performed genome editing using mouse ES cells, in which each of the CFIRs was altered to the corresponding mirror repeat sequence. The alterations reduced the level of the Pou5f1 transcript in the genome-edited cell lines, and elevated those of Sox2, Nanog, Klf4 and Esrrb. Furthermore, transcription of non-coding RNAs (ncRNAs) within the enhancer was also upregulated in the genome-edited cell lines, in a similar manner to Sox2, Nanog, Klf4 and Esrrb. These ncRNAs are hypothesized to control the expression of these four pluripotency genes. The CFIRs present in the Pou5f1 enhancer seem to be important to maintain the integrity of ES cells.
Collapse
|
7
|
Čutová M, Manta J, Porubiaková O, Kaura P, Šťastný J, Jagelská EB, Goswami P, Bartas M, Brázda V. Divergent distributions of inverted repeats and G-quadruplex forming sequences in Saccharomyces cerevisiae. Genomics 2019; 112:1897-1901. [PMID: 31706022 DOI: 10.1016/j.ygeno.2019.11.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 09/13/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022]
Abstract
The importance of DNA structure in the regulation of basic cellular processes is an emerging field of research. Among local non-B DNA structures, inverted repeat (IR) sequences that form cruciforms and G-rich sequences that form G-quadruplexes (G4) are found in all prokaryotic and eukaryotic organisms and are targets for regulatory proteins. We analyzed IRs and G4 sequences in the genome of the most important biotechnology microorganism, S. cerevisiae. IR and G4-prone sequences are enriched in specific genomic locations and differ markedly between mitochondrial and nuclear DNA. While G4s are overrepresented in telomeres and regions surrounding tRNAs, IRs are most enriched in centromeres, rDNA, replication origins and surrounding tRNAs. Mitochondrial DNA is enriched in both IR and G4-prone sequences relative to the nuclear genome. This extensive analysis of local DNA structures adds to the emerging picture of their importance in genome maintenance, DNA replication and transcription of subsets of genes.
Collapse
Affiliation(s)
- Michaela Čutová
- Brno University of Technology, Faculty of Chemistry, Purkyňova 118, 612 00 Brno, Czech Republic
| | - Jacinta Manta
- Brno University of Technology, Faculty of Chemistry, Purkyňova 118, 612 00 Brno, Czech Republic
| | - Otília Porubiaková
- Brno University of Technology, Faculty of Chemistry, Purkyňova 118, 612 00 Brno, Czech Republic
| | - Patrik Kaura
- Brno University of Technology, Faculty of Mechanical Engineering, Technická 2896/2, 616 69 Brno, Czech Republic
| | - Jiří Šťastný
- Brno University of Technology, Faculty of Mechanical Engineering, Technická 2896/2, 616 69 Brno, Czech Republic; Mendel University in Brno, Zemědělská 1665/1, 61300 Brno, Czech Republic
| | - Eva B Jagelská
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Pratik Goswami
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Martin Bartas
- Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, Ostrava 710 00, Czech Republic
| | - Václav Brázda
- Brno University of Technology, Faculty of Chemistry, Purkyňova 118, 612 00 Brno, Czech Republic; Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic.
| |
Collapse
|