1
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. Nucleic Acids Res 2025; 53:gkaf298. [PMID: 40226919 PMCID: PMC11995269 DOI: 10.1093/nar/gkaf298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 02/28/2025] [Accepted: 04/07/2025] [Indexed: 04/15/2025] Open
Abstract
Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, United States
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, United States
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, United States
- Center for Medical Genomics, Penn State University, University Park, PA 16802, United States
- L’EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, United States
- Center for Medical Genomics, Penn State University, University Park, PA 16802, United States
| |
Collapse
|
2
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.09.02.610891. [PMID: 39713403 PMCID: PMC11661062 DOI: 10.1101/2024.09.02.610891] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Non-canonical (non-B) DNA structures-e.g., bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g., A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies, and occupy 9-15%, 9-11%, and 12-38% of autosomes, and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
|
3
|
Gummadi ASC, Muppa DK, Yella VR. Dissecting non-B DNA structural motifs in untranslated regions of eukaryotic genomes. Genomics Inform 2024; 22:25. [PMID: 39605082 PMCID: PMC11603647 DOI: 10.1186/s44342-024-00028-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 11/01/2024] [Indexed: 11/29/2024] Open
Abstract
The untranslated regions (UTRs) of genes significantly impact various biological processes, including transcription, posttranscriptional control, mRNA stability, localization, and translation efficiency. In functional areas of genomes, non-B DNA structures such as cruciform, curved, triplex, G-quadruplex, and Z-DNA structures are common and have an impact on cellular physiology. Although the role of these structures in cis-regulatory regions such as promoters is well established in eukaryotic genomes, their prevalence within UTRs across different eukaryotic classes has not been extensively documented. Our study investigated the prevalence of various non-B DNA motifs within the 5' and 3' UTRs across diverse eukaryotic species. Our comparative analysis encompassed the 5'-UTRs and 3'UTRs of 360 species representing diverse eukaryotic domains of life, including Arthropoda (Diptera, Hemiptera, and Hymenoptera), Chordata (Artiodactyla, Carnivora, Galliformes, Passeriformes, Primates, Rodentia, Squamata, Testudines), Magnoliophyta (Brassicales), Fabales (Poales), and Nematoda (Rhabditida), on the basis of datasets derived from the UTRdb. We observed that species belonging to taxonomic orders such as Rhabditida, Diptera, Brassicales, and Hemiptera present a prevalence of curved DNA motifs in their UTRs, whereas orders such as Testudines, Galliformes, and Rodentia present a preponderance of G-quadruplexes in both UTRs. The distribution of motifs is conserved across different taxonomic classes, although species-specific variations in motif preferences were also observed. Our research unequivocally illuminates the prevalence and potential functional implications of non-B DNA motifs, offering invaluable insights into the evolutionary and biological significance of these structures.
Collapse
Affiliation(s)
- Aruna Sesha Chandrika Gummadi
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India
| | - Divya Kumari Muppa
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India
| | - Venakata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India.
| |
Collapse
|
4
|
Mohanty SK, Chiaromonte F, Makova KD. Evolutionary Dynamics of G-Quadruplexes in Human and Other Great Ape Telomere-to-Telomere Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.05.621973. [PMID: 39574740 PMCID: PMC11580976 DOI: 10.1101/2024.11.05.621973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Abstract
G-quadruplexes (G4s) are non-canonical DNA structures that can form at approximately 1% of the human genome. G4s contribute to point mutations and structural variation and thus facilitate genomic instability. They play important roles in regulating replication, transcription, and telomere maintenance, and some of them evolve under purifying selection. Nevertheless, the evolutionary dynamics of G4s has remained underexplored. Here we conducted a comprehensive analysis of predicted G4s (pG4s) in the recently released, telomere-to-telomere (T2T) genomes of human and other great apes-bonobo, chimpanzee, gorilla, Bornean orangutan, and Sumatran orangutan. We annotated tens of thousands of new pG4s in T2T compared to previous ape genome assemblies, including 41,236 in the human genome. Analyzing species alignments, we found approximately one-third of pG4s shared by all apes studied and identified thousands of species- and genus-specific pG4s. pG4s accumulated and diverged at rates consistent with divergence times between the studied species. We observed a significant enrichment and hypomethylation of pG4 shared across species at regulatory regions, including promoters, 5' and 3'UTRs, and origins of replication, strongly suggesting their formation and functional role in these regions. pG4s shared among great apes displayed lower methylation levels compared to species-specific pG4s, suggesting evolutionary conservation of functional roles of the former. Many species-specific pG4s were located in the repetitive and satellite regions deciphered in the T2T genomes. Our findings illuminate the evolutionary dynamics of G4s, their role in gene regulation, and their potential contribution to species-specific adaptations in great apes, emphasizing the utility of high-resolution T2T genomes in uncovering previously elusive genomic features.
Collapse
Affiliation(s)
- Saswat K. Mohanty
- Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Penn State University, University Park, PA 16802, USA
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
5
|
Pipier A, Chetot T, Kalamatianou A, Martin N, Caroff M, Britton S, Chéron N, Trantírek L, Granzhan A, Monchaud D. Structural Optimization of Azacryptands for Targeting Three-Way DNA Junctions. Angew Chem Int Ed Engl 2024; 63:e202409780. [PMID: 38873877 DOI: 10.1002/anie.202409780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/11/2024] [Accepted: 06/14/2024] [Indexed: 06/15/2024]
Abstract
Transient melting of the duplex-DNA (B-DNA) during DNA transactions allows repeated sequences to fold into non-B-DNA structures, including DNA junctions and G-quadruplexes. These noncanonical structures can act as impediments to DNA polymerase progression along the duplex, thereby triggering DNA damage and ultimately jeopardizing genomic stability. Their stabilization by ad hoc ligands is currently being explored as a putative anticancer strategy since it might represent an efficient way to inflict toxic DNA damage specifically to rapidly dividing cancer cells. The relevance of this strategy is only emerging for three-way DNA junctions (TWJs) and, to date, no molecule has been recognized as a reference TWJ ligand, featuring both high affinity and selectivity. Herein, we characterize such reference ligands through a combination of in vitro techniques comprising affinity and selectivity assays (competitive FRET-melting and TWJ Screen assays), functional tests (qPCR and Taq stop assays) and structural analyses (molecular dynamics and NMR investigations). We identify novel azacryptands TrisNP-amphi and TrisNP-ana as the most promising ligands, interacting with TWJs with high affinity and selectivity. These ligands represent new molecular tools to investigate the cellular roles of TWJs and explore how they can be exploited in innovative anticancer therapies.
Collapse
Affiliation(s)
- Angélique Pipier
- Institut de Chimie Moléculaire, ICMUB CNRS UMR6302, 9, Avenue Alain Savary, 21078, Dijon, France
| | - Titouan Chetot
- Chemistry and Modelling for the Biology of Cancer (CMBC), CNRS UMR9187, INSERM U1196, Institut Curie, Université Paris Saclay, 91405, Orsay, France
| | - Apollonia Kalamatianou
- Chemistry and Modelling for the Biology of Cancer (CMBC), CNRS UMR9187, INSERM U1196, Institut Curie, Université Paris Saclay, 91405, Orsay, France
| | - Nicolas Martin
- Chemistry and Modelling for the Biology of Cancer (CMBC), CNRS UMR9187, INSERM U1196, Institut Curie, Université Paris Saclay, 91405, Orsay, France
| | - Maëlle Caroff
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France
| | - Sébastien Britton
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France
| | - Nicolas Chéron
- PASTEUR, Département de chimie, École Normale Supérieure (ENS), PSL University, Sorbonne Université, CNRS UMR8640, 75005, Paris, France
| | - Lukáš Trantírek
- Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00, Brno, Czech Republic
| | - Anton Granzhan
- Chemistry and Modelling for the Biology of Cancer (CMBC), CNRS UMR9187, INSERM U1196, Institut Curie, Université Paris Saclay, 91405, Orsay, France
| | - David Monchaud
- Institut de Chimie Moléculaire, ICMUB CNRS UMR6302, 9, Avenue Alain Savary, 21078, Dijon, France
| |
Collapse
|
6
|
Yi C, Liu Q, Huang Y, Liu C, Guo X, Fan C, Zhang K, Liu Y, Han F. Non-B-form DNA is associated with centromere stability in newly-formed polyploid wheat. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1479-1488. [PMID: 38639838 DOI: 10.1007/s11427-023-2513-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 12/18/2023] [Indexed: 04/20/2024]
Abstract
Non-B-form DNA differs from the classic B-DNA double helix structure and plays a crucial regulatory role in replication and transcription. However, the role of non-B-form DNA in centromeres, especially in polyploid wheat, remains elusive. Here, we systematically analyzed seven non-B-form DNA motif profiles (A-phased DNA repeat, direct repeat, G-quadruplex, inverted repeat, mirror repeat, short tandem repeat, and Z-DNA) in hexaploid wheat. We found that three of these non-B-form DNA motifs were enriched at centromeric regions, especially at the CENH3-binding sites, suggesting that non-B-form DNA may create a favorable loading environment for the CENH3 nucleosome. To investigate the dynamics of centromeric non-B form DNA during the alloploidization process, we analyzed DNA secondary structure using CENH3 ChIP-seq data from newly formed allotetraploid wheat and its two diploid ancestors. We found that newly formed allotetraploid wheat formed more non-B-form DNA in centromeric regions compared with their parents, suggesting that non-B-form DNA is related to the localization of the centromeric regions in newly formed wheat. Furthermore, non-B-form DNA enriched in the centromeric regions was found to preferentially form on young LTR retrotransposons, explaining CENH3's tendency to bind to younger LTR. Collectively, our study describes the landscape of non-B-form DNA in the wheat genome, and sheds light on its potential role in the evolution of polyploid centromeres.
Collapse
Affiliation(s)
- Congyang Yi
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qian Liu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yuhong Huang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chang Liu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xianrui Guo
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chaolan Fan
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kaibiao Zhang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yang Liu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Fangpu Han
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
7
|
Gadgil RY, Rider SD, Shrestha R, Alhawach V, Hitch D, Leffak M. Microsatellite break-induced replication generates highly mutagenized extrachromosomal circular DNAs. NAR Cancer 2024; 6:zcae027. [PMID: 38854437 PMCID: PMC11161834 DOI: 10.1093/narcan/zcae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/17/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
Extrachromosomal circular DNAs (eccDNAs) are produced from all regions of the eucaryotic genome. We used inverse PCR of non-B microsatellites capable of forming hairpin, triplex, quadruplex and AT-rich structures integrated at a common ectopic chromosomal site to show that these non-B DNAs generate highly mutagenized eccDNAs by replication-dependent mechanisms. Mutagenesis occurs within the non-B DNAs and extends several kilobases bidirectionally into flanking and nonallelic DNA. Each non-B DNA exhibits a different pattern of mutagenesis, while sister clones containing the same non-B DNA also display distinct patterns of recombination, microhomology-mediated template switching and base substitutions. Mutations include mismatches, short duplications, long nontemplated insertions, large deletions and template switches to sister chromatids and nonallelic chromosomes. Drug-induced replication stress or the depletion of DNA repair factors Rad51, the COPS2 signalosome subunit or POLη change the pattern of template switching and alter the eccDNA mutagenic profiles. We propose an asynchronous capture model based on break-induced replication from microsatellite-induced DNA double strand breaks to account for the generation and circularization of mutagenized eccDNAs and the appearance of genomic homologous recombination deficiency (HRD) scars. These results may help to explain the appearance of tumor eccDNAS and their roles in neoantigen production, oncogenesis and resistance to chemotherapy.
Collapse
Affiliation(s)
- Rujuta Yashodhan Gadgil
- Department of Biochemistry and Molecular Biology, Boonshoft School of Medicine, Wright State University, Dayton, OH 45435, USA
| | - S Dean Rider
- Department of Biochemistry and Molecular Biology, Boonshoft School of Medicine, Wright State University, Dayton, OH 45435, USA
| | - Resha Shrestha
- Department of Biochemistry and Molecular Biology, Boonshoft School of Medicine, Wright State University, Dayton, OH 45435, USA
| | - Venicia Alhawach
- Department of Biochemistry and Molecular Biology, Boonshoft School of Medicine, Wright State University, Dayton, OH 45435, USA
| | - David C Hitch
- Department of Biochemistry and Molecular Biology, Boonshoft School of Medicine, Wright State University, Dayton, OH 45435, USA
| | - Michael Leffak
- Department of Biochemistry and Molecular Biology, Boonshoft School of Medicine, Wright State University, Dayton, OH 45435, USA
| |
Collapse
|
8
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. Genetics 2024; 226:iyae013. [PMID: 38298127 PMCID: PMC10990422 DOI: 10.1093/genetics/iyae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/11/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024] Open
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E Goldberg
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
9
|
Trizna L, Olajoš J, Víglaský V. DNA minicircles capable of forming a variety of non-canonical structural motifs. Front Chem 2024; 12:1384201. [PMID: 38595699 PMCID: PMC11002140 DOI: 10.3389/fchem.2024.1384201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 03/12/2024] [Indexed: 04/11/2024] Open
Abstract
Although more than 10% of the human genome has the potential to fold into non-B DNA, the formation of non-canonical structural motifs as part of long dsDNA chains are usually considered as unfavorable from a thermodynamic point of view. However, recent experiments have confirmed that non-canonical motifs do exist and are non-randomly distributed in genomic DNA. This distribution is highly dependent not only on the DNA sequence but also on various other factors such as environmental conditions, DNA topology and the expression of specific cellular factors in different cell types. In this study, we describe a new strategy used in the preparation of DNA minicircles containing different non-canonical motifs which arise as a result of imperfect base pairing between complementary strands. The approach exploits the fact that imperfections in the pairing of complementary strands thermodynamically weaken the dsDNA structure at the expense of enhancing the formation of non-canonical motifs. In this study, a completely different concept of stable integration of a non-canonical motif into dsDNA is presented. Our approach allows the integration of various types of non-canonical motifs into the dsDNA structure such as hairpin, cruciform, G-quadruplex and i-motif forms but also combinations of these forms. Small DNA minicircles have recently become the subject of considerable interest in both fundamental research and in terms of their potential therapeutic applications.
Collapse
Affiliation(s)
| | | | - Viktor Víglaský
- Department of Biochemistry, Institute of Chemistry, Faculty of Sciences, P. J. Šafárik University, Košice, Slovakia
| |
Collapse
|
10
|
Gadgil RY, Rider SD, Shrestha R, Alhawach V, Hitch DC, Leffak M. Microsatellite break-induced replication generates highly mutagenized extrachromosomal circular DNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575055. [PMID: 38260482 PMCID: PMC10802558 DOI: 10.1101/2024.01.12.575055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Extrachromosomal circular DNAs (eccDNAs) are produced from all regions of the eucaryotic genome. In tumors, highly transcribed eccDNAs have been implicated in oncogenesis, neoantigen production and resistance to chemotherapy. Here we show that unstable microsatellites capable of forming hairpin, triplex, quadruplex and AT-rich structures generate eccDNAs when integrated at a common ectopic site in human cells. These non-B DNA prone microsatellites form eccDNAs by replication-dependent mechanisms. The microsatellite-based eccDNAs are highly mutagenized and display template switches to sister chromatids and to nonallelic chromosomal sites. High frequency mutagenesis occurs within the eccDNA microsatellites and extends bidirectionally for several kilobases into flanking DNA and nonallelic DNA. Mutations include mismatches, short duplications, longer nontemplated insertions and large deletions. Template switching leads to recurrent deletions and recombination domains within the eccDNAs. Template switching events are microhomology-mediated, but do not occur at all potential sites of complementarity. Each microsatellite exhibits a distinct pattern of recombination, microhomology choice and base substitution signature. Depletion of Rad51, the COPS2 signalosome subunit or POLη alter the eccDNA mutagenic profiles. We propose an asynchronous capture model based on break-induced replication from microsatellite-induced DNA breaks for the generation and circularization of mutagenized eccDNAs and genomic homologous recombination deficiency (HRD) scars.
Collapse
|
11
|
Laspata N, Muoio D, Fouquerel E. Multifaceted Role of PARP1 in Maintaining Genome Stability Through Its Binding to Alternative DNA Structures. J Mol Biol 2024; 436:168207. [PMID: 37481154 PMCID: PMC11552663 DOI: 10.1016/j.jmb.2023.168207] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 06/28/2023] [Accepted: 07/12/2023] [Indexed: 07/24/2023]
Abstract
Alternative DNA structures that differ from the canonical B-form of DNA can arise from repetitive sequences and play beneficial roles in many cellular processes such as gene regulation and chromatin organization. However, they also threaten genomic stability in several ways including mutagenesis and collisions with replication and/or transcription machinery, which lead to genomic instability that is associated with human disease. Thus, the careful regulation of non-B-DNA structure formation and resolution is crucial for the maintenance of genome integrity. Several protein factors have been demonstrated to associate with alternative DNA structures to facilitate their removal, one of which is the ADP-ribose transferase (ART) PARP1 (also called ADP-ribosyltransferase diphtheria toxin-like 1 or ARTD1), a multifaceted DNA repair enzyme that recognizes single- and double-stranded DNA breaks and synthesizes chains of poly (ADP-ribose) (PAR) to recruit DNA repair proteins. It is now well appreciated that PARP1 recognizes several nucleic acid structures beyond DNA lesions, including stalled replication forks, DNA hairpins and cruciforms, R-loops, and DNA G-quadruplexes (G4 DNA). In this review, we summarize the current evidence of a direct association of PARP1 with each of these aforementioned alternative DNA structures, as well as discuss the role of PARP1 in the prevention of non-B-DNA structure-induced genetic instability. We will focus on the mechanisms of the recognition and binding by PARP1 to each alternative structure and the structure-based stimulation of PARP1 catalytic activity upon binding. Finally, we will discuss some of the outstanding gaps in the literature and offer speculative insight for questions that remain to be experimentally addressed.
Collapse
Affiliation(s)
- Natalie Laspata
- UPMC Hillman Cancer Center, University of Pittsburgh Cancer Institute, Department of Pharmacology and Chemical Biology, Pittsburgh, PA 15232, USA; Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Daniela Muoio
- UPMC Hillman Cancer Center, University of Pittsburgh Cancer Institute, Department of Pharmacology and Chemical Biology, Pittsburgh, PA 15232, USA
| | - Elise Fouquerel
- UPMC Hillman Cancer Center, University of Pittsburgh Cancer Institute, Department of Pharmacology and Chemical Biology, Pittsburgh, PA 15232, USA.
| |
Collapse
|
12
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573131. [PMID: 38187618 PMCID: PMC10769404 DOI: 10.1101/2023.12.22.573131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E. Goldberg
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
| | - Michelle D. Noyes
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Howard Hughes Medical Institute, 3720 15 Ave NE, University of Washington, Seattle, WA, 98195
| | - Aaron R. Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
- These authors contributed equally to this work
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Computational Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109
- These authors contributed equally to this work
| |
Collapse
|
13
|
Yella VR, Vanaja A. Computational analysis on the dissemination of non-B DNA structural motifs in promoter regions of 1180 cellular genomes. Biochimie 2023; 214:101-111. [PMID: 37311475 DOI: 10.1016/j.biochi.2023.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/05/2023] [Accepted: 06/05/2023] [Indexed: 06/15/2023]
Abstract
The promoter regions of gene regulation are under evolutionary constraints and earlier studies uncovered that they are characterized by enrichment of functional non-B DNA structural signatures like curved DNA, cruciform DNA, G-quadruplex, triple-helical DNA, slipped DNA structures, and Z-DNA. However, these studies are restricted to a few model organisms, single non-B DNA motif types, or whole genomic sequences, and their comparative accumulation in promoter regions of different domains of life has not been reported comprehensively. In this study, for the first time, we investigated the preponderance of non-B DNA-prone motifs in promoter regions in 1180 genomes belonging to 28 taxonomic groups using the non-B DNA Motif Search Tool (nBMST). The trends suggest that they are predominant in promoters compared to the upstream and downstream regions of all three domains of life and variably linked to taxonomic groups. Cruciform DNA motif is the most abundant form of non-B DNA, spanning from archaea to lower eukaryotes. Curved DNA motifs are prominent in host-associated bacteria, and suppressed in mammals. Triplex-DNA and slipped DNA structure repeats are discretely dispersed in all lineages. G-quadruplex motifs are significantly enriched in mammals. We also observed that the unique enrichment of non-B DNA in promoters is strongly linked to genome GC, size, evolutionary time divergence, and ecological adaptations. Overall, our work systematically reports the unique non-B DNA structural landscape of cellular organisms from the perspective of the cis-regulatory code of genomes.
Collapse
Affiliation(s)
- Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, 522302, Andhra Pradesh, India.
| | - Akkinepally Vanaja
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, 522302, Andhra Pradesh, India; KL College of Pharmacy, Koneru Lakshmaiah Education Foundation, Guntur, 522302, Andhra Pradesh, India
| |
Collapse
|
14
|
Glasscock CJ, Pecoraro R, McHugh R, Doyle LA, Chen W, Boivin O, Lonnquist B, Na E, Politanska Y, Haddox HK, Cox D, Norn C, Coventry B, Goreshnik I, Vafeados D, Lee GR, Gordan R, Stoddard BL, DiMaio F, Baker D. Computational design of sequence-specific DNA-binding proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.20.558720. [PMID: 37790440 PMCID: PMC10542524 DOI: 10.1101/2023.09.20.558720] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Sequence-specific DNA-binding proteins (DBPs) play critical roles in biology and biotechnology, and there has been considerable interest in the engineering of DBPs with new or altered specificities for genome editing and other applications. While there has been some success in reprogramming naturally occurring DBPs using selection methods, the computational design of new DBPs that recognize arbitrary target sites remains an outstanding challenge. We describe a computational method for the design of small DBPs that recognize specific target sequences through interactions with bases in the major groove, and employ this method in conjunction with experimental screening to generate binders for 5 distinct DNA targets. These binders exhibit specificity closely matching the computational models for the target DNA sequences at as many as 6 base positions and affinities as low as 30-100 nM. The crystal structure of a designed DBP-target site complex is in close agreement with the design model, highlighting the accuracy of the design method. The designed DBPs function in both Escherichia coli and mammalian cells to repress and activate transcription of neighboring genes. Our method is a substantial step towards a general route to small and hence readily deliverable sequence-specific DBPs for gene regulation and editing.
Collapse
Affiliation(s)
- Cameron J. Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Robert Pecoraro
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Physics, University of Washington, Seattle, WA, USA
| | - Ryan McHugh
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lindsey A. Doyle
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Wei Chen
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Olivier Boivin
- Program in Genetics and Genomic, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Beau Lonnquist
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Emily Na
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Yuliya Politanska
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Hugh K. Haddox
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - David Cox
- Department of Biochemistry, Stanford University School of Medicine, Palo Alto, CA USA
- Department of Medicine, Division of Hematology, Stanford University, Stanford, CA, USA
| | - Christoffer Norn
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| | - Brian Coventry
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Inna Goreshnik
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Dionne Vafeados
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA USA
| | - Raluca Gordan
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Department of Computer Science, Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, USA
| | - Barry L. Stoddard
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| |
Collapse
|
15
|
Wang W, Zhang X, Garcia S, Leitch AR, Kovařík A. Intragenomic rDNA variation - the product of concerted evolution, mutation, or something in between? Heredity (Edinb) 2023; 131:179-188. [PMID: 37402824 PMCID: PMC10462631 DOI: 10.1038/s41437-023-00634-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/06/2023] Open
Abstract
The classical model of concerted evolution states that hundreds to thousands of ribosomal DNA (rDNA) units undergo homogenization, making the multiple copies of the individual units more uniform across the genome than would be expected given mutation frequencies and gene redundancy. While the universality of this over 50-year-old model has been confirmed in a range of organisms, advanced high throughput sequencing techniques have also revealed that rDNA homogenization in many organisms is partial and, in rare cases, even apparently failing. The potential underpinning processes leading to unexpected intragenomic variation have been discussed in a number of studies, but a comprehensive understanding remains to be determined. In this work, we summarize information on variation or polymorphisms in rDNAs across a wide range of taxa amongst animals, fungi, plants, and protists. We discuss the definition and description of concerted evolution and describe whether incomplete concerted evolution of rDNAs predominantly affects coding or non-coding regions of rDNA units and if it leads to the formation of pseudogenes or not. We also discuss the factors contributing to rDNA variation, such as interspecific hybridization, meiotic cycles, rDNA expression status, genome size, and the activity of effector genes involved in genetic recombination, epigenetic modifications, and DNA editing. Finally, we argue that a combination of approaches is needed to target genetic and epigenetic phenomena influencing incomplete concerted evolution, to give a comprehensive understanding of the evolution and functional consequences of intragenomic variation in rDNA.
Collapse
Affiliation(s)
- Wencai Wang
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Xianzhi Zhang
- Department of Horticulture, College of Horticulture and Landscape Architecture, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Sònia Garcia
- Institut Botànic de Barcelona, IBB (CSIC - Ajuntament de Barcelona), Barcelona, Spain
| | - Andrew R Leitch
- School of Biological and Behavioral Sciences, Queen Mary University of London, London, E1 4NS, UK
| | - Aleš Kovařík
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, CZ-61200, Czech Republic.
| |
Collapse
|
16
|
Arkhipova IR, Yushenova IA, Rodriguez F. Shaping eukaryotic epigenetic systems by horizontal gene transfer. Bioessays 2023; 45:e2200232. [PMID: 37339822 PMCID: PMC10287040 DOI: 10.1002/bies.202200232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 05/07/2023] [Accepted: 05/08/2023] [Indexed: 06/22/2023]
Abstract
DNA methylation constitutes one of the pillars of epigenetics, relying on covalent bonds for addition and/or removal of chemically distinct marks within the major groove of the double helix. DNA methyltransferases, enzymes which introduce methyl marks, initially evolved in prokaryotes as components of restriction-modification systems protecting host genomes from bacteriophages and other invading foreign DNA. In early eukaryotic evolution, DNA methyltransferases were horizontally transferred from bacteria into eukaryotes several times and independently co-opted into epigenetic regulatory systems, primarily via establishing connections with the chromatin environment. While C5-methylcytosine is the cornerstone of plant and animal epigenetics and has been investigated in much detail, the epigenetic role of other methylated bases is less clear. The recent addition of N4-methylcytosine of bacterial origin as a metazoan DNA modification highlights the prerequisites for foreign gene co-option into the host regulatory networks, and challenges the existing paradigms concerning the origin and evolution of eukaryotic regulatory systems.
Collapse
Affiliation(s)
- Irina R Arkhipova
- Marine Biological Laboratory, Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Woods Hole, Massachusetts, USA
| | - Irina A Yushenova
- Marine Biological Laboratory, Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Woods Hole, Massachusetts, USA
| | - Fernando Rodriguez
- Marine Biological Laboratory, Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Woods Hole, Massachusetts, USA
| |
Collapse
|
17
|
Macken WL, Falabella M, Pizzamiglio C, Woodward CE, Scotchman E, Chitty LS, Polke JM, Bugiardini E, Hanna MG, Vandrovcova J, Chandler N, Labrum R, Pitceathly RDS. Enhanced mitochondrial genome analysis: bioinformatic and long-read sequencing advances and their diagnostic implications. Expert Rev Mol Diagn 2023; 23:797-814. [PMID: 37642407 DOI: 10.1080/14737159.2023.2241365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/24/2023] [Indexed: 08/31/2023]
Abstract
INTRODUCTION Primary mitochondrial diseases (PMDs) comprise a large and heterogeneous group of genetic diseases that result from pathogenic variants in either nuclear DNA (nDNA) or mitochondrial DNA (mtDNA). Widespread adoption of next-generation sequencing (NGS) has improved the efficiency and accuracy of mtDNA diagnoses; however, several challenges remain. AREAS COVERED In this review, we briefly summarize the current state of the art in molecular diagnostics for mtDNA and consider the implications of improved whole genome sequencing (WGS), bioinformatic techniques, and the adoption of long-read sequencing, for PMD diagnostics. EXPERT OPINION We anticipate that the application of PCR-free WGS from blood DNA will increase in diagnostic laboratories, while for adults with myopathic presentations, WGS from muscle DNA may become more widespread. Improved bioinformatic strategies will enhance WGS data interrogation, with more accurate delineation of mtDNA and NUMTs (nuclear mitochondrial DNA segments) in WGS data, superior coverage uniformity, indirect measurement of mtDNA copy number, and more accurate interpretation of heteroplasmic large-scale rearrangements (LSRs). Separately, the adoption of diagnostic long-read sequencing could offer greater resolution of complex LSRs and the opportunity to phase heteroplasmic variants.
Collapse
Affiliation(s)
- William L Macken
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
| | - Micol Falabella
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Chiara Pizzamiglio
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
| | - Cathy E Woodward
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
- Rare and Inherited Disease Laboratory, North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Elizabeth Scotchman
- Rare and Inherited Disease Laboratory, North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Lyn S Chitty
- Rare and Inherited Disease Laboratory, North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - James M Polke
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
- Rare and Inherited Disease Laboratory, North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Enrico Bugiardini
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
| | - Michael G Hanna
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
| | - Jana Vandrovcova
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Natalie Chandler
- Rare and Inherited Disease Laboratory, North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Robyn Labrum
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
- Rare and Inherited Disease Laboratory, North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Robert D S Pitceathly
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, Queen Square Centre for Neuromuscular Diseases, The National Hospital for Neurology and Neurosurgery, London, UK
| |
Collapse
|
18
|
Hosseini M, Palmer A, Manka W, Grady PGS, Patchigolla V, Bi J, O'Neill RJ, Chi Z, Aguiar D. Deep statistical modelling of nanopore sequencing translocation times reveals latent non-B DNA structures. Bioinformatics 2023; 39:i242-i251. [PMID: 37387144 DOI: 10.1093/bioinformatics/btad220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures. RESULTS We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.
Collapse
Affiliation(s)
- Marjan Hosseini
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Aaron Palmer
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - William Manka
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Patrick G S Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3003, United States
| | - Venkata Patchigolla
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Rachel J O'Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3003, United States
| | - Zhiyi Chi
- Department of Statistics, University of Connecticut, Storrs, CT 06269-4120, United States
| | - Derek Aguiar
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| |
Collapse
|
19
|
Weissensteiner MH, Cremona MA, Guiblet WM, Stoler N, Harris RS, Cechova M, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Accurate sequencing of DNA motifs able to form alternative (non-B) structures. Genome Res 2023; 33:907-922. [PMID: 37433640 PMCID: PMC10519405 DOI: 10.1101/gr.277490.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 05/04/2023] [Indexed: 07/13/2023]
Abstract
Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.
Collapse
Affiliation(s)
| | - Marzia A Cremona
- Department of Operations and Decision Systems, Université Laval, Quebec, Quebec G1V0A6, Canada
- Population Health and Optimal Health Practices, CHU de Québec-Université Laval Research Center, Québec, Quebec G1V4G2, Canada
- Center for Medical Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Wilfried M Guiblet
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Laboratory of Cell Biology, NCI-CCR, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Nicholas Stoler
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Robert S Harris
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Monika Cechova
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic
| | - Kristin A Eckert
- Center for Medical Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Pathology, The Pennsylvania State University, College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Francesca Chiaromonte
- Center for Medical Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Institute of Economics and L'EMbeDS, Sant'Anna School of Advanced Studies, Pisa 56127, Italy
| | - Yi-Fei Huang
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kateryna D Makova
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Center for Medical Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
20
|
Kong Y, Mead EA, Fang G. Navigating the pitfalls of mapping DNA and RNA modifications. Nat Rev Genet 2023; 24:363-381. [PMID: 36653550 PMCID: PMC10722219 DOI: 10.1038/s41576-022-00559-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/21/2022] [Indexed: 01/19/2023]
Abstract
Chemical modifications to nucleic acids occur across the kingdoms of life and carry important regulatory information. Reliable high-resolution mapping of these modifications is the foundation of functional and mechanistic studies, and recent methodological advances based on next-generation sequencing and long-read sequencing platforms are critical to achieving this aim. However, mapping technologies may have limitations that sometimes lead to inconsistent results. Some of these limitations are technical in nature and specific to certain types of technology. Here, however, we focus on common (yet not always widely recognized) pitfalls that are shared among frequently used mapping technologies and discuss strategies to help technology developers and users mitigate their effects. Although the emphasis is primarily on DNA modifications, RNA modifications are also discussed.
Collapse
Affiliation(s)
- Yimeng Kong
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Edward A Mead
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gang Fang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
21
|
Abstract
Repetitive elements in the human genome, once considered 'junk DNA', are now known to adopt more than a dozen alternative (that is, non-B) DNA structures, such as self-annealed hairpins, left-handed Z-DNA, three-stranded triplexes (H-DNA) or four-stranded guanine quadruplex structures (G4 DNA). These dynamic conformations can act as functional genomic elements involved in DNA replication and transcription, chromatin organization and genome stability. In addition, recent studies have revealed a role for these alternative structures in triggering error-generating DNA repair processes, thereby actively enabling genome plasticity. As a driving force for genetic variation, non-B DNA structures thus contribute to both disease aetiology and evolution.
Collapse
Affiliation(s)
- Guliang Wang
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA
| | - Karen M Vasquez
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA.
| |
Collapse
|
22
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
23
|
Makova KD, Weissensteiner MH. Noncanonical DNA structures are drivers of genome evolution. Trends Genet 2023; 39:109-124. [PMID: 36604282 PMCID: PMC9877202 DOI: 10.1016/j.tig.2022.11.005] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 11/04/2022] [Accepted: 11/28/2022] [Indexed: 01/05/2023]
Abstract
In addition to the canonical right-handed double helix, other DNA structures, termed 'non-B DNA', can form in the genomes across the tree of life. Non-B DNA regulates multiple cellular processes, including replication and transcription, yet its presence is associated with elevated mutagenicity and genome instability. These discordant cellular roles fuel the enormous potential of non-B DNA to drive genomic and phenotypic evolution. Here we discuss recent studies establishing non-B DNA structures as novel functional elements subject to natural selection, affecting evolution of transposable elements (TEs), and specifying centromeres. By highlighting the contributions of non-B DNA to repeated evolution and adaptation to changing environments, we conclude that evolutionary analyses should include a perspective of not only DNA sequence, but also its structure.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Penn State University, 310 Wartik Laboratory, University Park, PA 16802, USA.
| | | |
Collapse
|
24
|
Wang F, Guo Y, Liu Z, Wang Q, Jiang Y, Zhao G. New insights into the novel sequences of the chicken pan-genome by liquid chip. J Anim Sci 2022; 100:6759641. [PMID: 36223424 PMCID: PMC9733507 DOI: 10.1093/jas/skac336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 10/11/2022] [Indexed: 12/15/2022] Open
Abstract
Increasing evidence indicates that the missing sequences and genes in the chicken reference genome are involved in many crucial biological pathways, including metabolism and immunity. The low detection rate of novel sequences by resequencing hindered the acquisition of these sequences and the exploration of the relationship between new genes and economic traits. To improve the capture ratio of novel sequences, a 48K liquid chip including 25K from the reference sequence and 23K from the novel sequence was designed. The assay was tested on a panel of 218 animals from 5 chicken breeds. The average capture ratio of the reference sequence was 99.55%, and the average sequencing depth of the target sites was approximately 187X, indicating a good performance and successful application of liquid chips in farm animals. For the target region in the novel sequence, the average capture ratio was 33.15% and the average sequencing depth of target sites was approximately 60X, both of which were higher than that of resequencing. However, the different capture ratios and capture regions among varieties and individuals proved the difficulty of capturing these regions with complex structures. After genotyping, GWAS showed variations in novel sequences potentially relevant to immune-related traits. For example, a SNP close to the differentiation of lymphocyte-related gene IGHV3-23-like was associated with the H/L ratio. These results suggest that targeted capture sequencing is a preferred method to capture these sequences with complex structures and genes potentially associated with immune-related traits.
Collapse
Affiliation(s)
| | | | | | - Qiao Wang
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | | |
Collapse
|
25
|
Mc Cartney AM, Shafin K, Alonge M, Bzikadze AV, Formenti G, Fungtammasan A, Howe K, Jain C, Koren S, Logsdon GA, Miga KH, Mikheenko A, Paten B, Shumate A, Soto DC, Sović I, Wood JMD, Zook JM, Phillippy AM, Rhie A. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods 2022; 19:687-695. [PMID: 35361931 PMCID: PMC9812399 DOI: 10.1038/s41592-022-01440-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/04/2022] [Indexed: 01/07/2023]
Abstract
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
Collapse
Affiliation(s)
- Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, USA
| | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA
- Digital BioLogic d.o.o., Ivanić-Grad, Croatia
| | | | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA.
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA.
| |
Collapse
|
26
|
Chromosome organization affects genome evolution in Sulfolobus archaea. Nat Microbiol 2022; 7:820-830. [PMID: 35618771 DOI: 10.1038/s41564-022-01127-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 04/21/2022] [Indexed: 12/16/2022]
Abstract
In all organisms, the DNA sequence and the structural organization of chromosomes affect gene expression. The extremely thermophilic crenarchaeon Sulfolobus has one circular chromosome with three origins of replication. We previously revealed that this chromosome has defined A and B compartments that have high and low gene expression, respectively. As well as higher levels of gene expression, the A compartment contains the origins of replication. To evaluate the impact of three-dimensional organization on genome evolution, we characterized the effect of replication origins and compartmentalization on primary sequence evolution in eleven Sulfolobus species. Using single-nucleotide polymorphism analyses, we found that distance from an origin of replication was associated with increased mutation rates in the B but not in the A compartment. The enhanced polymorphisms distal to replication origins suggest that replication termination may have a causal role in their generation. Further mutational analyses revealed that the sequences in the A compartment are less likely to be mutated, and that there is stronger purifying selection than in the B compartment. Finally, we applied the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) to show that the B compartment is less accessible than the A compartment. Taken together, our data suggest that compartmentalization of chromosomal DNA can influence chromosome evolution in Sulfolobus. We propose that the A compartment serves as a haven for stable maintenance of gene sequences, while sequences in the B compartment can be diversified.
Collapse
|
27
|
Patchigolla VS, Mellone BG. Enrichment of Non-B-Form DNA at D. melanogaster Centromeres. Genome Biol Evol 2022; 14:evac054. [PMID: 35441684 PMCID: PMC9070824 DOI: 10.1093/gbe/evac054] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2022] [Indexed: 11/17/2022] Open
Abstract
Centromeres are essential chromosomal regions that mediate the accurate inheritance of genetic information during eukaryotic cell division. Despite their conserved function, centromeres do not contain conserved DNA sequences and are instead epigenetically marked by the presence of the centromere-specific histone H3 variant centromeric protein A. The functional contribution of centromeric DNA sequences to centromere identity remains elusive. Previous work found that dyad symmetries with a propensity to adopt noncanonical secondary DNA structures are enriched at the centromeres of several species. These findings lead to the proposal that noncanonical DNA structures may contribute to centromere specification. Here, we analyze the predicted secondary structures of the recently identified centromere DNA sequences of Drosophila melanogaster. Although dyad symmetries are only enriched on the Y centromere, we find that other types of noncanonical DNA structures, including melted DNA and G-quadruplexes, are common features of all D. melanogaster centromeres. Our work is consistent with previous models suggesting that noncanonical DNA secondary structures may be conserved features of centromeres with possible implications for centromere specification.
Collapse
Affiliation(s)
| | - Barbara G. Mellone
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
28
|
Li M, Sun C, Xu N, Bian P, Tian X, Wang X, Wang Y, Jia X, Heller R, Wang M, Wang F, Dai X, Luo R, Guo Y, Wang X, Yang P, Hu D, Liu Z, Fu W, Zhang S, Li X, Wen C, Lan F, Siddiki AZ, Suwannapoom C, Zhao X, Nie Q, Hu X, Jiang Y, Yang N. De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions. Mol Biol Evol 2022; 39:msac066. [PMID: 35325213 PMCID: PMC9021737 DOI: 10.1093/molbev/msac066] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The gene numbers and evolutionary rates of birds were assumed to be much lower than those of mammals, which is in sharp contrast to the huge species number and morphological diversity of birds. It is, therefore, necessary to construct a complete avian genome and analyze its evolution. We constructed a chicken pan-genome from 20 de novo assembled genomes with high sequencing depth, and identified 1,335 protein-coding genes and 3,011 long noncoding RNAs not found in GRCg6a. The majority of these novel genes were detected across most individuals of the examined transcriptomes but were seldomly measured in each of the DNA sequencing data regardless of Illumina or PacBio technology. Furthermore, different from previous pan-genome models, most of these novel genes were overrepresented on chromosomal subtelomeric regions and microchromosomes, surrounded by extremely high proportions of tandem repeats, which strongly blocks DNA sequencing. These hidden genes were proved to be shared by all chicken genomes, included many housekeeping genes, and enriched in immune pathways. Comparative genomics revealed the novel genes had 3-fold elevated substitution rates than known ones, updating the knowledge about evolutionary rates in birds. Our study provides a framework for constructing a better chicken genome, which will contribute toward the understanding of avian evolution and the improvement of poultry breeding.
Collapse
Affiliation(s)
- Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Congjiao Sun
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Naiyi Xu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Peipei Bian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Yuzhe Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing 100193, China
| | - Xinzheng Jia
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Rasmus Heller
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N 2200, Denmark
| | - Mingshan Wang
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Fei Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Rongsong Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Yingwei Guo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiangnan Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Dexiang Hu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Zhenyu Liu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Weiwei Fu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Shunjin Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiaochang Li
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Chaoliang Wen
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Fangren Lan
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Amam Zonaed Siddiki
- Department of Pathology and Parasitology, Faculty of Veterinary Medicine, Chittagong Veterinary and Animal Sciences University, Chittagong 4202, Bangladesh
| | | | - Xin Zhao
- Department of Animal Science, McGill University, Montreal, QC, Canada
| | - Qinghua Nie
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China
| | - Xiaoxiang Hu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
- Center for Functional Genomics, Institute of Future Agriculture, Northwest A&F University, China
| | - Ning Yang
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| |
Collapse
|
29
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 1472] [Impact Index Per Article: 490.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
30
|
Takahashi Y, Shoura M, Fire A, Morishita S. Context-dependent DNA polymerization effects can masquerade as DNA modification signals. BMC Genomics 2022; 23:249. [PMID: 35361121 PMCID: PMC8973881 DOI: 10.1186/s12864-022-08471-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 03/15/2022] [Indexed: 11/23/2022] Open
Abstract
Background Single molecule measurements of DNA polymerization kinetics provide a sensitive means to detect both secondary structures in DNA and deviations from primary chemical structure as a result of modified bases. In one approach to such analysis, deviations can be inferred by monitoring the behavior of DNA polymerase using single-molecule, real-time sequencing with zero-mode waveguide. This approach uses a Single Molecule Real Time (SMRT)-sequencing measurement of time between fluorescence pulse signals from consecutive nucleosides incorporated during DNA replication, called the interpulse duration (IPD). Results In this paper we present an analysis of loci with high IPDs in two genomes, a bacterial genome (E. coli) and a eukaryotic genome (C. elegans). To distinguish the potential effects of DNA modification on DNA polymerization speed, we paired an analysis of native genomic DNA with whole-genome amplified (WGA) material in which DNA modifications were effectively removed. Adenine modification sites for E. coli are known and we observed the expected IPD shifts at these sites in the native but not WGA samples. For C. elegans, such differences were not observed. Instead, we found a number of novel sequence contexts where IPDs were raised relative to the average IPDs for each of the four nucleotides, but for which the raised IPD was present in both native and WGA samples. Conclusion The latter results argue strongly against DNA modification as the underlying driver for high IPD segments for C. elegans, and provide a framework for separating effects of DNA modification from context-dependent DNA polymerase kinetic patterns inherent in underlying DNA sequence for a complex eukaryotic genome. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08471-2.
Collapse
Affiliation(s)
- Yusuke Takahashi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Massa Shoura
- Departments of Pathology and Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Andrew Fire
- Departments of Pathology and Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
31
|
Vanaja A, Yella VR. Delineation of the DNA Structural Features of Eukaryotic Core Promoter Classes. ACS OMEGA 2022; 7:5657-5669. [PMID: 35224327 PMCID: PMC8867553 DOI: 10.1021/acsomega.1c04603] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 01/27/2022] [Indexed: 05/02/2023]
Abstract
The eukaryotic transcription is orchestrated from a chunk of the DNA region stated as the core promoter. Multifarious and punctilious core promoter signals, viz., TATA-box, Inr, BREs, and Pause Button, are associated with a subset of genes and regulate their spatiotemporal expression. However, the core promoter architecture linked with these signals has not been investigated exhaustively for several species. In this study, we attempted to envisage the adaptive binding landscape of the transcription initiation machinery as a function of DNA structure. To this end, we deployed a set of k-mer based DNA structural estimates and regular expression models derived from experiments, molecular dynamic simulations, and theoretical frameworks, and high-throughout promoter data sets retrieved from the eukaryotic promoter database. We categorized protein-coding gene core promoters based on characteristic motifs at precise locations and analyzed the B-DNA structural properties and non-B-DNA structural motifs for 15 different eukaryotic genomes. We observed that Inr, BREd, and no-motif classes display common patterns of DNA sequence and structural environment. TATA-containing, BREu, and Pause Button classes show a deviant behavior with the TATA class displaying varied axial and twisting flexibility while BREu and Pause Button leaned toward G-quadruplex motif enrichment. Intriguingly, DNA meltability and shape signals are conserved irrespective of the presence or absence of distinct core promoter motifs in the majority of species. Altogether, here we delineated the conserved DNA structural signals associated with several promoter classes that may contribute to the chromatin configuration, orchestration of transcription machinery, and DNA duplex melting during the transcription process.
Collapse
Affiliation(s)
- Akkinepally Vanaja
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
- KL
College of Pharmacy, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
| | - Venkata Rajesh Yella
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
| |
Collapse
|
32
|
Kong Y, Cao L, Deikus G, Fan Y, Mead EA, Lai W, Zhang Y, Yong R, Sebra R, Wang H, Zhang XS, Fang G. Critical assessment of DNA adenine methylation in eukaryotes using quantitative deconvolution. Science 2022; 375:515-522. [PMID: 35113693 PMCID: PMC9382770 DOI: 10.1126/science.abe7489] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The discovery of N6-methyldeoxyadenine (6mA) across eukaryotes led to a search for additional epigenetic mechanisms. However, some studies have highlighted confounding factors that challenge the prevalence of 6mA in eukaryotes. We developed a metagenomic method to quantitatively deconvolve 6mA events from a genomic DNA sample into species of interest, genomic regions, and sources of contamination. Applying this method, we observed high-resolution 6mA deposition in two protozoa. We found that commensal or soil bacteria explained the vast majority of 6mA in insect and plant samples. We found no evidence of high abundance of 6mA in Drosophila, Arabidopsis, or humans. Plasmids used for genetic manipulation, even those from Dam methyltransferase mutant Escherichia coli, could carry abundant 6mA, confounding the evaluation of candidate 6mA methyltransferases and demethylases. On the basis of this work, we advocate for a reassessment of 6mA in eukaryotes.
Collapse
Affiliation(s)
- Yimeng Kong
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Lei Cao
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Yu Fan
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Edward A. Mead
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| | - Weiyi Lai
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences; Beijing 100085, China
| | - Yizhou Zhang
- Department of Neurosurgery and Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York; NY 10029, USA
| | - Raymund Yong
- Department of Neurosurgery and Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York; NY 10029, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
- Sema4, a Mount Sinai venture; Stamford, CT, 06902, USA
| | - Hailin Wang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences; Beijing 100085, China
| | - Xue-Song Zhang
- Center for Advanced Biotechnology and Medicine, Rutgers University; New Brunswick, NJ, 08854, USA
| | - Gang Fang
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai; New York, NY 10029, USA
| |
Collapse
|
33
|
In-Depth Sequence Analysis of Bread Wheat VRN1 Genes. Int J Mol Sci 2021; 22:ijms222212284. [PMID: 34830166 PMCID: PMC8626038 DOI: 10.3390/ijms222212284] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/02/2021] [Accepted: 11/11/2021] [Indexed: 12/31/2022] Open
Abstract
The VERNALIZATION1 (VRN1) gene encodes a MADS-box transcription factor and plays an important role in the cold-induced transition from the vegetative to reproductive stage. Allelic variability of VRN1 homoeologs has been associated with large differences in flowering time. The aim of this study was to investigate the genetic variability of VRN1 homoeologs (VRN-A1, VRN-B1 and VRN-D1). We performed an in-depth sequence analysis of VRN1 homoeologs in a panel of 105 winter and spring varieties of hexaploid wheat. We describe the novel allele Vrn-B1f with an 836 bp insertion within intron 1 and show its specific expression pattern associated with reduced heading time. We further provide the complete sequence of the Vrn-A1b allele, revealing a 177 bp insertion in intron 1, which is transcribed into an alternative splice variant. Copy number variation (CNV) analysis of VRN1 homoeologs showed that VRN-B1 and VRN-D1 are present in only one copy. The copy number of recessive vrn-A1 ranged from one to four, while that of dominant Vrn-A1 was one or two. Different numbers of Vrn-A1a copies in the spring cultivars Branisovicka IX/49 and Bastion did not significantly affect heading time. We also report on the deletion of secondary structures (G-quadruplex) in promoter sequences of cultivars with more vrn-A1 copies.
Collapse
|
34
|
Guiblet WM, DeGiorgio M, Cheng X, Chiaromonte F, Eckert KA, Huang YF, Makova KD. Selection and thermostability suggest G-quadruplexes are novel functional elements of the human genome. Genome Res 2021; 31:1136-1149. [PMID: 34187812 PMCID: PMC8256861 DOI: 10.1101/gr.269589.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 05/24/2021] [Indexed: 12/11/2022]
Abstract
Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, University Park, Pennsylvania 16802, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida 33431, USA
| | - Xiaoheng Cheng
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
- Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kristin A Eckert
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
- Department of Pathology, Penn State University, College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, Pennsylvania 16802, USA
| |
Collapse
|
35
|
Teng YC, Sundaresan A, O'Hara R, Gant VU, Li M, Martire S, Warshaw JN, Basu A, Banaszynski LA. ATRX promotes heterochromatin formation to protect cells from G-quadruplex DNA-mediated stress. Nat Commun 2021; 12:3887. [PMID: 34162889 PMCID: PMC8222256 DOI: 10.1038/s41467-021-24206-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 06/07/2021] [Indexed: 12/15/2022] Open
Abstract
ATRX is a tumor suppressor that has been associated with protection from DNA replication stress, purportedly through resolution of difficult-to-replicate G-quadruplex (G4) DNA structures. While several studies demonstrate that loss of ATRX sensitizes cells to chemical stabilizers of G4 structures, the molecular function of ATRX at G4 regions during replication remains unknown. Here, we demonstrate that ATRX associates with a number of the MCM replication complex subunits and that loss of ATRX leads to G4 structure accumulation at newly synthesized DNA. We show that both the helicase domain of ATRX and its H3.3 chaperone function are required to protect cells from G4-induced replicative stress. Furthermore, these activities are upstream of heterochromatin formation mediated by the histone methyltransferase, ESET, which is the critical molecular event that protects cells from G4-mediated stress. In support, tumors carrying mutations in either ATRX or ESET show increased mutation burden at G4-enriched DNA sequences. Overall, our study provides new insights into mechanisms by which ATRX promotes genome stability with important implications for understanding impacts of its loss on human disease.
Collapse
Affiliation(s)
- Yu-Ching Teng
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Aishwarya Sundaresan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ryan O'Hara
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Vincent U Gant
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Minhua Li
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Sara Martire
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jane N Warshaw
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Amrita Basu
- Department of Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Laura A Banaszynski
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, Department of Obstetrics and Gynecology, Children's Medical Center Research Institute, Harold. C. Simmons Comprehensive Cancer Center, Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
36
|
Goldberg ME, Harris K. Mutational signatures of replication timing and epigenetic modification persist through the global divergence of mutation spectra across the great ape phylogeny. Genome Biol Evol 2021; 14:6275268. [PMID: 33983415 PMCID: PMC8743035 DOI: 10.1093/gbe/evab104] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/07/2021] [Indexed: 11/17/2022] Open
Abstract
Great ape clades exhibit variation in the relative mutation rates of different three-base-pair genomic motifs, with closely related species having more similar mutation spectra than distantly related species. This pattern cannot be explained by classical demographic or selective forces, but imply that DNA replication fidelity has been perturbed in different ways on each branch of the great ape phylogeny. Here, we use whole-genome variation from 88 great apes to investigate whether these species’ mutation spectra are broadly differentiated across the entire genome, or whether mutation spectrum differences are driven by DNA compartments that have particular functional features or chromatin states. We perform principal component analysis (PCA) and mutational signature deconvolution on mutation spectra ascertained from compartments defined by features including replication timing and ancient repeat content, finding evidence for consistent species-specific mutational signatures that do not depend on which functional compartments the spectra are ascertained from. At the same time, we find that many compartments have their own characteristic mutational signatures that appear stable across the great ape phylogeny. For example, in a mutation spectrum PCA compartmentalized by replication timing, the second principal component explaining 21.2% of variation separates all species’ late-replicating regions from their early-replicating regions. Our results suggest that great ape mutation spectrum evolution is not driven by epigenetic changes that modify mutation rates in specific genomic regions, but instead by trans-acting mutational modifiers that affect mutagenesis across the whole genome fairly uniformly.
Collapse
Affiliation(s)
- Michael E Goldberg
- University of Washington Department of Genome Sciences, 3720 15th Ave NE, Seattle WA 98105, United States of America
| | - Kelley Harris
- University of Washington Department of Genome Sciences, 3720 15th Ave NE, Seattle WA 98105, United States of America.,Fred Hutchinson Cancer Center Computational Biology Division, 1100 Fairview Ave N, Seattle, WA 98109, United States of America
| |
Collapse
|
37
|
Chen D, Cremona MA, Qi Z, Mitra RD, Chiaromonte F, Makova KD. Human L1 Transposition Dynamics Unraveled with Functional Data Analysis. Mol Biol Evol 2021; 37:3576-3600. [PMID: 32722770 DOI: 10.1093/molbev/msaa194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies.
Collapse
Affiliation(s)
- Di Chen
- Intercollege Graduate Degree Program in Genetics, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA.,Department of Operations and Decision Systems, Université Laval, Québec, Canada
| | - Zongtai Qi
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Robi D Mitra
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA.,EMbeDS, Sant'Anna School of Advanced Studies, Pisa, Italy.,The Huck Institutes of the Life Sciences, Center for Medical Genomics, The Pennsylvania State University, University Park, PA
| | - Kateryna D Makova
- The Huck Institutes of the Life Sciences, Center for Medical Genomics, The Pennsylvania State University, University Park, PA.,Department of Biology, The Pennsylvania State University, University Park, PA
| |
Collapse
|
38
|
Tokan V, Lorenzo JLR, Jedlicka P, Kejnovska I, Hobza R, Kejnovsky E. Quadruplex-Forming Motif Inserted into 3'UTR of Ty1his3-AI Retrotransposon Inhibits Retrotransposition in Yeast. BIOLOGY 2021; 10:347. [PMID: 33924086 PMCID: PMC8074290 DOI: 10.3390/biology10040347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/09/2021] [Accepted: 04/15/2021] [Indexed: 11/20/2022]
Abstract
Guanine quadruplexes (G4s) serve as regulators of replication, recombination and gene expression. G4 motifs have been recently identified in LTR retrotransposons, but their role in the retrotransposon life-cycle is yet to be understood. Therefore, we inserted G4s into the 3'UTR of Ty1his3-AI retrotransposon and measured the frequency of retrotransposition in yeast strains BY4741, Y00509 (without Pif1 helicase) and with G4-stabilization by N-methyl mesoporphyrin IX (NMM) treatment. We evaluated the impact of G4s on mRNA levels by RT-qPCR and products of reverse transcription by Southern blot analysis. We found that the presence of G4 inhibited Ty1his3-AI retrotransposition. The effect was stronger when G4s were on a transcription template strand which leads to reverse transcription interruption. Both NMM and Pif1p deficiency reduced the retrotransposition irrespective of the presence of a G4 motif in the Ty1his3-AI element. Quantity of mRNA and products of reverse transcription did not fully explain the impact of G4s on Ty1his3-AI retrotransposition indicating that G4s probably affect some other steps of the retrotransposon life-cycle (e.g., translation, VLP formation, integration). Our results suggest that G4 DNA conformation can tune the activity of mobile genetic elements that in turn contribute to shaping the eukaryotic genomes.
Collapse
Affiliation(s)
- Viktor Tokan
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 61200 Brno, Czech Republic; (V.T.); (J.L.R.L.); (P.J.); (R.H.)
| | - Jose Luis Rodriguez Lorenzo
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 61200 Brno, Czech Republic; (V.T.); (J.L.R.L.); (P.J.); (R.H.)
| | - Pavel Jedlicka
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 61200 Brno, Czech Republic; (V.T.); (J.L.R.L.); (P.J.); (R.H.)
| | - Iva Kejnovska
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 61200 Brno, Czech Republic;
| | - Roman Hobza
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 61200 Brno, Czech Republic; (V.T.); (J.L.R.L.); (P.J.); (R.H.)
| | - Eduard Kejnovsky
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, 61200 Brno, Czech Republic; (V.T.); (J.L.R.L.); (P.J.); (R.H.)
| |
Collapse
|
39
|
Helou L, Beauclair L, Dardente H, Piégu B, Tsakou-Ngouafo L, Lecomte T, Kentsis A, Pontarotti P, Bigot Y. The piggyBac-derived protein 5 (PGBD5) transposes both the closely and the distantly related piggyBac-like elements Tcr-pble and Ifp2. J Mol Biol 2021; 433:166839. [PMID: 33539889 PMCID: PMC8404143 DOI: 10.1016/j.jmb.2021.166839] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 12/21/2020] [Accepted: 01/14/2021] [Indexed: 12/16/2022]
Abstract
The vertebrate piggyBac derived transposase 5 (PGBD5) encodes a domesticated transposase, which is active and able to transpose its distantly related piggyBac-like element (pble), Ifp2. This raised the question whether PGBD5 would be more effective at mobilizing a phylogenetically closely related pble element. We aimed to identify the pble most closely related to the pgbd5 gene. We updated the landscape of vertebrate pgbd genes to develop efficient filters and identify the most closely related pble to each of these genes. We found that Tcr-pble is phylogenetically the closest pble to the pgbd5 gene. Furthermore, we evaluated the capacity of two murine and human PGBD5 isoforms, Mm523 and Hs524, to transpose both Tcr-pble and Ifp2 elements. We found that both pbles could be transposed by Mm523 with similar efficiency. However, integrations of both pbles occurred through both proper transposition and improper PGBD5-dependent recombination. This suggested that the ability of PGBD5 to bind both pbles may not be based on the primary sequence of element ends, but may involve recognition of inner DNA motifs, possibly related to palindromic repeats. In agreement with this hypothesis, we identified internal palindromic repeats near the end of 24 pble sequences, which display distinct sequences.
Collapse
Affiliation(s)
- Laura Helou
- UMR INRAE 0085, CNRS 7247, Physiologie de la Reproduction et des Comportements, Centre INRA Val de Loire, 37380 Nouzilly, France
| | - Linda Beauclair
- UMR INRAE 0085, CNRS 7247, Physiologie de la Reproduction et des Comportements, Centre INRA Val de Loire, 37380 Nouzilly, France
| | - Hugues Dardente
- UMR INRAE 0085, CNRS 7247, Physiologie de la Reproduction et des Comportements, Centre INRA Val de Loire, 37380 Nouzilly, France
| | - Benoît Piégu
- UMR INRAE 0085, CNRS 7247, Physiologie de la Reproduction et des Comportements, Centre INRA Val de Loire, 37380 Nouzilly, France
| | - Louis Tsakou-Ngouafo
- UMR MEPHI D-258, I, IRD, Aix Marseille Université, 19-21 Boulevard Jean Moulin, 13005 Marseille, France; CNRS SNC 5039, 13005 Marseille, France
| | | | - Alex Kentsis
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Weill Cornell Medical College, Cornell University, New York, NY, USA; Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Pierre Pontarotti
- UMR INRAE 0085, CNRS 7247, Physiologie de la Reproduction et des Comportements, Centre INRA Val de Loire, 37380 Nouzilly, France; CNRS SNC 5039, 13005 Marseille, France
| | - Yves Bigot
- UMR INRAE 0085, CNRS 7247, Physiologie de la Reproduction et des Comportements, Centre INRA Val de Loire, 37380 Nouzilly, France.
| |
Collapse
|
40
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
41
|
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, Liachko I, Haryoko T, Jønsson KA, Zhou Q, Irestedt M, Suh A. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour 2021; 21:263-286. [PMID: 32937018 PMCID: PMC7757076 DOI: 10.1111/1755-0998.13252] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/21/2020] [Accepted: 08/26/2020] [Indexed: 01/09/2023]
Abstract
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
- Museum für NaturkundeLeibniz Institut für Evolutions‐ und BiodiversitätsforschungBerlinGermany
| | - Luohao Xu
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
| | - Reto Burri
- Department of Population EcologyInstitute of Ecology and EvolutionFriedrich‐Schiller‐University JenaJenaGermany
| | | | - Ignas Bunikis
- Department of Immunology, Genetics and PathologyScience for Life LaboratoryUppsala Genome CenterUppsala UniversityUppsalaSweden
| | | | - Tri Haryoko
- Research Centre for BiologyMuseum Zoologicum BogorienseIndonesian Institute of Sciences (LIPI)CibinongIndonesia
| | - Knud A. Jønsson
- Natural History Museum of DenmarkUniversity of CopenhagenCopenhagenDenmark
| | - Qi Zhou
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
- MOE Laboratory of Biosystems Homeostasis & ProtectionLife Sciences InstituteZhejiang UniversityHangzhouChina
- Center for Reproductive MedicineThe 2nd Affiliated HospitalSchool of MedicineZhejiang UniversityHangzhouChina
| | - Martin Irestedt
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
| | - Alexander Suh
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- School of Biological Sciences—Organisms and the EnvironmentUniversity of East AngliaNorwichUK
| |
Collapse
|
42
|
Yella VR, Vanaja A, Kulandaivelu U, Kumar A. Delving into Eukaryotic Origins of Replication Using DNA Structural Features. ACS OMEGA 2020; 5:13601-13611. [PMID: 32566825 PMCID: PMC7301376 DOI: 10.1021/acsomega.0c00441] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/15/2020] [Indexed: 05/18/2023]
Abstract
DNA replication in eukaryotes is an intricate process, which is precisely synchronized by a set of regulatory proteins, and the replication fork emanates from discrete sites on chromatin called origins of replication (Oris). These spots are considered as the gateway to chromosomal replication and are stereotyped by sequence motifs. The cognate sequences are noticeable in a small group of entire origin regions or totally absent across different metazoans. Alternatively, the use of DNA secondary structural features can provide additional information compared to the primary sequence. In this article, we report the trends in DNA sequence-based structural properties of origin sequences in nine eukaryotic systems representing different families of life. Biologically relevant DNA secondary structural properties, namely, stability, propeller twist, flexibility, and minor groove shape were studied in the sequences flanking replication start sites. Results indicate that Oris in yeasts show lower stability, more rigidity, and narrow minor groove preferences compared to genomic sequences surrounding them. Yeast Oris also show preference for A-tracts and the promoter element TATA box in the vicinity of replication start sites. On the contrary, Drosophila melanogaster, humans, and Arabidopsis thaliana do not have such features in their Oris, and instead, they show high preponderance of G-rich sequence motifs such as putative G-quadruplexes or i-motifs and CpG islands. Our extensive study applies the DNA structural feature computation to delve into origins of replication across organisms ranging from yeasts to mammals and including a plant. Insights from this study would be significant in understanding origin architecture and help in designing new algorithms for predicting DNA trans-acting factor recognition events.
Collapse
Affiliation(s)
- Venkata Rajesh Yella
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Guntur 522502, Andhra Pradesh, India
| | - Akkinepally Vanaja
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Guntur 522502, Andhra Pradesh, India
- KL
College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
| | - Umasankar Kulandaivelu
- KL
College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
| | - Aditya Kumar
- Department
of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
| |
Collapse
|
43
|
Ray S, Tillo D, Boer RE, Assad N, Barshai M, Wu G, Orenstein Y, Yang D, Schneekloth JS, Vinson C. Custom DNA Microarrays Reveal Diverse Binding Preferences of Proteins and Small Molecules to Thousands of G-Quadruplexes. ACS Chem Biol 2020; 15:925-935. [PMID: 32216326 PMCID: PMC7263473 DOI: 10.1021/acschembio.9b00934] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Single-stranded DNA (ssDNA) containing four guanine repeats can form G-quadruplex (G4) structures. While cellular proteins and small molecules can bind G4s, it has been difficult to broadly assess their DNA-binding specificity. Here, we use custom DNA microarrays to examine the binding specificities of proteins, small molecules, and antibodies across ∼15,000 potential G4 structures. Molecules used include fluorescently labeled pyridostatin (Cy5-PDS, a small molecule), BG4 (Cy5-BG4, a G4-specific antibody), and eight proteins (GST-tagged nucleolin, IGF2, CNBP, FANCJ, PIF1, BLM, DHX36, and WRN). Cy5-PDS and Cy5-BG4 selectively bind sequences known to form G4s, confirming their formation on the microarrays. Cy5-PDS binding decreased when G4 formation was inhibited using lithium or when ssDNA features on the microarray were made double-stranded. Similar conditions inhibited the binding of all other molecules except for CNBP and PIF1. We report that proteins have different G4-binding preferences suggesting unique cellular functions. Finally, competition experiments are used to assess the binding specificity of an unlabeled small molecule, revealing the structural features in the G4 required to achieve selectivity. These data demonstrate that the microarray platform can be used to assess the binding preferences of molecules to G4s on a broad scale, helping to understand the properties that govern molecular recognition.
Collapse
Affiliation(s)
| | | | - Robert E. Boer
- Chemical Biology Laboratory, National Cancer Institute-Frederick, Frederick, Maryland 21702, United States
| | - Nima Assad
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Mira Barshai
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Guanhui Wu
- Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, West Lafayette, Indiana 47907, United States
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Danzhou Yang
- Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, West Lafayette, Indiana 47907, United States
| | - John S. Schneekloth
- Chemical Biology Laboratory, National Cancer Institute-Frederick, Frederick, Maryland 21702, United States
| | - Charles Vinson
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
44
|
Chiara M, Zambelli F, Picardi E, Horner DS, Pesole G. Critical assessment of bioinformatics methods for the characterization of pathological repeat expansions with single-molecule sequencing data. Brief Bioinform 2019; 21:1971-1986. [DOI: 10.1093/bib/bbz099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 06/22/2019] [Accepted: 07/09/2019] [Indexed: 01/19/2023] Open
Abstract
Abstract
A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.
Collapse
Affiliation(s)
- Matteo Chiara
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milan, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
| | - Federico Zambelli
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milan, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
| | - Ernesto Picardi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Via Orabona 4, 70126 Bari, Italy
| | - David S Horner
- Department of Biosciences, University of Milan, via Celoria 26, 20133 Milan, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Amendola e, 70126 Bari, Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Via Orabona 4, 70126 Bari, Italy
| |
Collapse
|
45
|
Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, Makova KD. High Satellite Repeat Turnover in Great Apes Studied with Short- and Long-Read Technologies. Mol Biol Evol 2019; 36:2415-2431. [PMID: 31273383 PMCID: PMC6805231 DOI: 10.1093/molbev/msz156] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 06/12/2019] [Accepted: 06/13/2019] [Indexed: 12/23/2022] Open
Abstract
Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.
Collapse
Affiliation(s)
- Monika Cechova
- Department of Biology, Pennsylvania State University, University Park, PA
| | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA
| | | | | | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, PA
- EMbeDS, Sant’Anna School of Advanced Studies, Pisa, Italy
- Center for Medical Genomics, Penn State, University Park, PA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA
- Center for Medical Genomics, Penn State, University Park, PA
| |
Collapse
|
46
|
Beauclair L, Ramé C, Arensburger P, Piégu B, Guillou F, Dupont J, Bigot Y. Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies. BMC Genomics 2019; 20:734. [PMID: 31610792 PMCID: PMC6792250 DOI: 10.1186/s12864-019-6131-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 09/23/2019] [Indexed: 12/14/2022] Open
Abstract
Background More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced. Results The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFα, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected. Conclusions High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.
Collapse
Affiliation(s)
- Linda Beauclair
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Christelle Ramé
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Peter Arensburger
- Biological Sciences Department, California State Polytechnic University, Pomona, CA, 91768, USA
| | - Benoît Piégu
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Florian Guillou
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Joëlle Dupont
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Yves Bigot
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France.
| |
Collapse
|
47
|
Cremona MA, Xu H, Makova KD, Reimherr M, Chiaromonte F, Madrigal P. Functional data analysis for computational biology. Bioinformatics 2019; 35:3211-3213. [PMID: 30668667 PMCID: PMC6736445 DOI: 10.1093/bioinformatics/btz045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 01/01/2019] [Accepted: 01/17/2019] [Indexed: 12/25/2022] Open
Abstract
SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Hongyan Xu
- Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Kateryna D Makova
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Matthew Reimherr
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
- Institute of Economics, Sant’Anna School of Advanced Studies, EMbeDS Economics and Management in the era of Data Science, Pisa, Italy
| | - Pedro Madrigal
- Wellcome Trust – MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| |
Collapse
|
48
|
Comparison of mitochondrial DNA variants detection using short- and long-read sequencing. J Hum Genet 2019; 64:1107-1116. [DOI: 10.1038/s10038-019-0654-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 07/29/2019] [Accepted: 08/04/2019] [Indexed: 12/22/2022]
|
49
|
Hestand MS, Ameur A. The Versatility of SMRT Sequencing. Genes (Basel) 2019; 10:genes10010024. [PMID: 30621217 PMCID: PMC6357146 DOI: 10.3390/genes10010024] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 01/03/2019] [Indexed: 12/19/2022] Open
Affiliation(s)
- Matthew S Hestand
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45202, USA.
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45202, USA.
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Uppsala University, Science for Life Laboratory, 75025 Uppsala, Sweden.
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne 32901, Australia.
| |
Collapse
|