1
|
G-quadruplexes in the evolution of hepatitis B virus. Nucleic Acids Res 2023; 51:7198-7204. [PMID: 37395407 PMCID: PMC10415126 DOI: 10.1093/nar/gkad556] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/23/2023] [Accepted: 06/19/2023] [Indexed: 07/04/2023] Open
Abstract
Hepatitis B virus (HBV) is one of the most dangerous human pathogenic viruses found in all corners of the world. Recent sequencing of ancient HBV viruses revealed that these viruses have accompanied humanity for several millenia. As G-quadruplexes are considered to be potential therapeutic targets in virology, we examined G-quadruplex-forming sequences (PQS) in modern and ancient HBV genomes. Our analyses showed the presence of PQS in all 232 tested HBV genomes, with a total number of 1258 motifs and an average frequency of 1.69 PQS per kbp. Notably, the PQS with the highest G4Hunter score in the reference genome is the most highly conserved. Interestingly, the density of PQS motifs is lower in ancient HBV genomes than in their modern counterparts (1.5 and 1.9/kb, respectively). This modern frequency of 1.90 is very close to the PQS frequency of the human genome (1.93) using identical parameters. This indicates that the PQS content in HBV increased over time to become closer to the PQS frequency in the human genome. No statistically significant differences were found between PQS densities in HBV lineages found in different continents. These results, which constitute the first paleogenomics analysis of G4 propensity, are in agreement with our hypothesis that, for viruses causing chronic infections, their PQS frequencies tend to converge evolutionarily with those of their hosts, as a kind of 'genetic camouflage' to both hijack host cell transcriptional regulatory systems and to avoid recognition as foreign material.
Collapse
|
2
|
Analysis of G-Quadruplex-Forming Sequences in Drought Stress-Responsive Genes, and Synthesis Genes of Phenolic Compounds in Arabidopsis thaliana. LIFE (BASEL, SWITZERLAND) 2023; 13:life13010199. [PMID: 36676148 PMCID: PMC9865073 DOI: 10.3390/life13010199] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 12/30/2022] [Accepted: 01/08/2023] [Indexed: 01/11/2023]
Abstract
Sequences of nucleic acids with the potential to form four-stranded G-quadruplex structures are intensively studied mainly in the context of human diseases, pathogens, or extremophile organisms; nonetheless, the knowledge about their occurrence and putative role in plants is still limited. This work is focused on G-quadruplex-forming sites in two gene sets of interest: drought stress-responsive genes, and genes related to the production/biosynthesis of phenolic compounds in the model plant organism Arabidopsis thaliana. In addition, 20 housekeeping genes were analyzed as well, where the constitutive gene expression was expected (with no need for precise regulation depending on internal or external factors). The results have shown that none of the tested gene sets differed significantly in the content of G-quadruplex-forming sites, however, the highest frequency of G-quadruplex-forming sites was found in the 5'-UTR regions of phenolic compounds' biosynthesis genes, which indicates the possibility of their regulation at the mRNA level. In addition, mainly within the introns and 1000 bp flanks downstream gene regions, G-quadruplex-forming sites were highly underrepresented. Finally, cluster analysis allowed us to observe similarities between particular genes in terms of their PQS characteristics. We believe that the original approach used in this study may become useful for further and more comprehensive bioinformatic studies in the field of G-quadruplex genomics.
Collapse
|
3
|
The Newly Sequenced Genome of Pisum sativum Is Replete with Potential G-Quadruplex-Forming Sequences-Implications for Evolution and Biological Regulation. Int J Mol Sci 2022; 23:8482. [PMID: 35955617 PMCID: PMC9369095 DOI: 10.3390/ijms23158482] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 07/25/2022] [Accepted: 07/28/2022] [Indexed: 11/20/2022] Open
Abstract
G-quadruplexes (G4s) have been long considered rare and physiologically unimportant in vitro curiosities, but recent methodological advances have proved their presence and functions in vivo. Moreover, in addition to their functional relevance in bacteria and animals, including humans, their importance has been recently demonstrated in evolutionarily distinct plant species. In this study, we analyzed the genome of Pisum sativum (garden pea, or the so-called green pea), a unique member of the Fabaceae family. Our results showed that this genome contained putative G4 sequences (PQSs). Interestingly, these PQSs were located nonrandomly in the nuclear genome. We also found PQSs in mitochondrial (mt) and chloroplast (cp) DNA, and we experimentally confirmed G4 formation for sequences found in these two organelles. The frequency of PQSs for nuclear DNA was 0.42 PQSs per thousand base pairs (kbp), in the same range as for cpDNA (0.53/kbp), but significantly lower than what was found for mitochondrial DNA (1.58/kbp). In the nuclear genome, PQSs were mainly associated with regulatory regions, including 5'UTRs, and upstream of the rRNA region. In contrast to genomic DNA, PQSs were located around RNA genes in cpDNA and mtDNA. Interestingly, PQSs were also associated with specific transposable elements such as TIR and LTR and around them, pointing to their role in their spreading in nuclear DNA. The nonrandom localization of PQSs uncovered their evolutionary and functional significance in the Pisum sativum genome.
Collapse
|
4
|
Interaction of Proteins with Inverted Repeats and Cruciform Structures in Nucleic Acids. Int J Mol Sci 2022; 23:ijms23116171. [PMID: 35682854 PMCID: PMC9180970 DOI: 10.3390/ijms23116171] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 05/26/2022] [Accepted: 05/30/2022] [Indexed: 01/27/2023] Open
Abstract
Cruciforms occur when inverted repeat sequences in double-stranded DNA adopt intra-strand hairpins on opposing strands. Biophysical and molecular studies of these structures confirm their characterization as four-way junctions and have demonstrated that several factors influence their stability, including overall chromatin structure and DNA supercoiling. Here, we review our understanding of processes that influence the formation and stability of cruciforms in genomes, covering the range of sequences shown to have biological significance. It is challenging to accurately sequence repetitive DNA sequences, but recent advances in sequencing methods have deepened understanding about the amounts of inverted repeats in genomes from all forms of life. We highlight that, in the majority of genomes, inverted repeats are present in higher numbers than is expected from a random occurrence. It is, therefore, becoming clear that inverted repeats play important roles in regulating many aspects of DNA metabolism, including replication, gene expression, and recombination. Cruciforms are targets for many architectural and regulatory proteins, including topoisomerases, p53, Rif1, and others. Notably, some of these proteins can induce the formation of cruciform structures when they bind to DNA. Inverted repeat sequences also influence the evolution of genomes, and growing evidence highlights their significance in several human diseases, suggesting that the inverted repeat sequences and/or DNA cruciforms could be useful therapeutic targets in some cases.
Collapse
|
5
|
Conservation and over-representation of G-quadruplex sequences in regulatory regions of mitochondrial DNA across distinct taxonomic sub-groups. Biochimie 2021; 194:28-34. [PMID: 34942301 DOI: 10.1016/j.biochi.2021.12.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 11/22/2021] [Accepted: 12/14/2021] [Indexed: 11/02/2022]
Abstract
G-quadruplexes have important regulatory roles in the nuclear genome but their distribution and potential roles in mitochondrial DNA (mtDNA) are poorly understood. We analysed 11883 mtDNA sequences from 18 taxonomic sub-groups and identified their frequency and location within mtDNA. Large differences in both the frequency and number of putative quadruplex-forming sequences (PQS) were observed amongst all the organisms and PQS frequency was negatively correlated with an increase in evolutionary age. PQS were over-represented in the 3'UTRs, D-loops, replication origins, and stem loops, indicating regulatory roles for quadruplexes in mtDNA. Variations of the G-quadruplex-forming sequence in the conserved sequence block II (CSBII) region of the human D-loop were conserved amongst other mammals, amphibians, birds, reptiles, and fishes. This D-loop PQS was conserved in the duplicated control regions of some birds and reptiles, indicating its importance to mitochondrial function. The guanine tracts in these PQS also displayed significant length heterogeneity and the length of these guanine tracts were generally longest in bird mtDNA. This information provides further insights into how G4s may contribute to the regulation and function of mtDNA and acts as a database of information for future studies investigating mitochondrial G4s in organisms other than humans.
Collapse
|
6
|
New telomere to telomere assembly of human chromosome 8 reveals a previous underestimation of G-quadruplex forming sequences and inverted repeats. Gene 2021; 810:146058. [PMID: 34737002 DOI: 10.1016/j.gene.2021.146058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/14/2021] [Accepted: 10/29/2021] [Indexed: 11/04/2022]
Abstract
Taking advantage of evolving and improving sequencing methods, human chromosome 8 is now available as a gapless, end-to-end assembly. Thanks to advances in long-read sequencing technologies, its centromere, telomeres, duplicated gene families and repeat-rich regions are now fully sequenced. We were interested to assess if the new assembly altered our understanding of the potential impact of non-B DNA structures within this completed chromosome sequence. It has been shown that non-B secondary structures, such as G-quadruplexes, hairpins and cruciforms, have important regulatory functions and potential as targeted therapeutics. Therefore, we analysed the presence of putative G-quadruplex forming sequences and inverted repeats in the current human reference genome (GRCh38) and in the new end-to-end assembly of chromosome 8. The comparison revealed that the new assembly contains significantly more inverted repeats and G-quadruplex forming sequences compared to the current reference sequence. This observation can be explained by improved accuracy of the new sequencing methods, particularly in regions that contain extensive repeats of bases, as is preferred by many non-B DNA structures. These results show a significant underestimation of the prevalence of non-B DNA secondary structure in previous assembly versions of the human genome and point to their importance being not fully appreciated. We anticipate that similar observations will occur as the improved sequencing technologies fill in gaps across the genomes of humans and other organisms.
Collapse
|
7
|
Novel G-quadruplex prone sequences emerge in the complete assembly of the human X chromosome. Biochimie 2021; 191:87-90. [PMID: 34508825 DOI: 10.1016/j.biochi.2021.09.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/01/2021] [Accepted: 09/05/2021] [Indexed: 12/13/2022]
Abstract
G-quadruplexes are non-B secondary structures with regulatory functions and therapeutic potential. Improvements in sequencing methods recently allowed the completion of the first human chromosome which is now available as a gapless, end-to-end assembly, with the previously remaining spaces filled and newly identified regions added. We compared the presence of G-quadruplex forming sequences in the current human reference genome (GRCh38) and in the new end-to-end assembly of the X chromosome constructed by high-coverage ultra-long-read nanopore sequencing. This comparison revealed that, even though the corrected length of the chromosome X assembly is surprisingly 1.14% shorter than expected, the number of G-quadruplex forming sequences found in this gapless chromosome is significantly higher, with 493 new motifs having G4Hunter scores above 1.4 and 23 new sequences with G4Hunter scores above 3.5. This observation reflects an improved precision of the new sequencing approaches and points to an underestimation of G-quadruplex propensity in the previous, widely used version of the human genome assembly, especially for motifs with a high G4Hunter score, expected to be very stable. These G-quadruplex forming sequences probably remained undiscovered in earlier genome datasets due to previously unsolved G-rich and repetitive genomic regions. These observations allow a precise targeting of these important regulatory regions.
Collapse
|
8
|
Extraordinary diversity of telomeres, telomerase RNAs and their template regions in Saccharomycetaceae. Sci Rep 2021; 11:12784. [PMID: 34140564 PMCID: PMC8211666 DOI: 10.1038/s41598-021-92126-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 06/03/2021] [Indexed: 01/08/2023] Open
Abstract
Telomerase RNA (TR) carries the template for synthesis of telomere DNA and provides a scaffold for telomerase assembly. Fungal TRs are long and have been compared to higher eukaryotes, where they show considerable diversity within phylogenetically close groups. TRs of several Saccharomycetaceae were recently identified, however, many of these remained uncharacterised in the template region. Here we show that this is mainly due to high variability in telomere sequence. We predicted the telomere sequences using Tandem Repeats Finder and then we identified corresponding putative template regions in TR candidates. Remarkably long telomere units and the corresponding putative TRs were found in Tetrapisispora species. Notably, variable lengths of the annealing sequence of the template region (1–10 nt) were found. Consequently, species with the same telomere sequence may not harbour identical TR templates. Thus, TR sequence alone can be used to predict a template region and telomere sequence, but not to determine these exactly. A conserved feature of telomere sequences, tracts of adjacent Gs, led us to test the propensity of individual telomere sequences to form G4. The results show highly diverse values of G4-propensity, indicating the lack of ubiquitous conservation of this feature across Saccharomycetaceae.
Collapse
|
9
|
Analysis of putative quadruplex-forming sequences in fungal genomes: novel antifungal targets? Microb Genom 2021; 7:000570. [PMID: 33956596 PMCID: PMC8209732 DOI: 10.1099/mgen.0.000570] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 03/26/2021] [Indexed: 12/26/2022] Open
Abstract
Fungal infections cause >1 million deaths annually and the emergence of antifungal resistance has prompted the exploration for novel antifungal targets. Quadruplexes are four-stranded nucleic acid secondary structures, which can regulate processes such as transcription, translation, replication and recombination. They are also found in genes linked to virulence in microbes, and ligands that bind to quadruplexes can eliminate drug-resistant pathogens. Using a computational approach, we quantified putative quadruplex-forming sequences (PQS) in 1359 genomes across the fungal kingdom and explored their presence in genes related to virulence, drug resistance and biological processes associated with pathogenicity in Aspergillus fumigatus. Here we present the largest analysis of PQS in fungi and identify significant heterogeneity of these sequences throughout phyla, genera and species. PQS were genetically conserved in Aspergillus spp. and frequently pathogenic species appeared to contain fewer PQS than their lesser/non-pathogenic counterparts. GO-term analysis identified that PQS-containing genes were involved in processes linked with virulence such as zinc ion binding, the biosynthesis of secondary metabolites and regulation of transcription in A. fumigatus. Although the genome frequency of PQS was lower in A. fumigatus, PQS could be found enriched in genes involved in virulence, and genes upregulated during germination and hypoxia. Moreover, PQS were found in genes involved in drug resistance. Quadruplexes could have important roles within fungal biology and virulence, but their roles require further elucidation.
Collapse
|
10
|
SARS-CoV-2 hot-spot mutations are significantly enriched within inverted repeats and CpG island loci. Brief Bioinform 2021; 22:1338-1345. [PMID: 33341900 PMCID: PMC7799342 DOI: 10.1093/bib/bbaa385] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 11/16/2020] [Accepted: 11/27/2020] [Indexed: 12/18/2022] Open
Abstract
SARS-CoV-2 is an intensively investigated virus from the order Nidovirales (Coronaviridae family) that causes COVID-19 disease in humans. Through enormous scientific effort, thousands of viral strains have been sequenced to date, thereby creating a strong background for deep bioinformatics studies of the SARS-CoV-2 genome. In this study, we inspected high-frequency mutations of SARS-CoV-2 and carried out systematic analyses of their overlay with inverted repeat (IR) loci and CpG islands. The main conclusion of our study is that SARS-CoV-2 hot-spot mutations are significantly enriched within both IRs and CpG island loci. This points to their role in genomic instability and may predict further mutational drive of the SARS-CoV-2 genome. Moreover, CpG islands are strongly enriched upstream from viral ORFs and thus could play important roles in transcription and the viral life cycle. We hypothesize that hypermethylation of these loci will decrease the transcription of viral ORFs and could therefore limit the progression of the disease.
Collapse
|
11
|
Analyses of viral genomes for G-quadruplex forming sequences reveal their correlation with the type of infection. Biochimie 2021; 186:13-27. [PMID: 33839192 DOI: 10.1016/j.biochi.2021.03.017] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/30/2021] [Accepted: 03/31/2021] [Indexed: 12/12/2022]
Abstract
G-quadruplexes contribute to the regulation of key molecular processes. Their utilization for antiviral therapy is an emerging field of contemporary research. Here we present comprehensive analyses of the presence and localization of putative G-quadruplex forming sequences (PQS) in all viral genomes currently available in the NCBI database (including subviral agents). The G4Hunter algorithm was applied to a pool of 11,000 accessible viral genomes representing 350 Mbp in total. PQS frequencies differ across evolutionary groups of viruses, and are enriched in repeats, replication origins, 5'UTRs and 3'UTRs. Importantly, PQS presence and localization is connected to viral lifecycles and corresponds to the type of viral infection rather than to nucleic acid type; while viruses routinely causing persistent infections in Metazoa hosts are enriched for PQS, viruses causing acute infections are significantly depleted for PQS. The unique localization of PQS identifies the importance of G-quadruplex-based regulation of viral replication and life cycle, providing a tool for potential therapeutic targeting.
Collapse
|
12
|
Abstract
BACKGROUND Influenza viruses are dangerous pathogens. Seventy-Seven genomes of recently emerged genotype 4 reassortant Eurasian avian-like H1N1 virus (G4-EA-H1N1) are currently available. We investigated the presence and variation of potential G-quadruplex forming sequences (PQS), which can serve as targets for antiviral treatment. RESULTS PQS were identified in all 77 genomes. The total number of PQS in G4-EA-H1N1 genomes was 571. Interestingly, the number of PQS per genome in individual close relative viruses varied from 4 to 12. PQS were not randomly distributed in the 8 segments of the G4-EA-H1N1 genome, the highest frequency of PQS being found in the NP segment (1.39 per 1000 nt), which is considered a potential target for antiviral therapy. In contrast, no PQS was found in the NS segment. Analyses of variability pointed the importance of some PQS; even if genome variation of influenza virus is extreme, the PQS with the highest G4Hunter score is the most conserved in all tested genomes. G-quadruplex formation in vitro was experimentally confirmed using spectroscopic methods. CONCLUSIONS The results presented here hint several G-quadruplex-forming sequences in G4-EA-H1N1 genomes, that could provide good therapeutic targets.
Collapse
|
13
|
In-Depth Bioinformatic Analyses of Nidovirales Including Human SARS-CoV-2, SARS-CoV, MERS-CoV Viruses Suggest Important Roles of Non-canonical Nucleic Acid Structures in Their Lifecycles. Front Microbiol 2020; 11:1583. [PMID: 32719673 PMCID: PMC7347907 DOI: 10.3389/fmicb.2020.01583] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 06/17/2020] [Indexed: 12/15/2022] Open
Abstract
Non-canonical nucleic acid structures play important roles in the regulation of molecular processes. Considering the importance of the ongoing coronavirus crisis, we decided to evaluate genomes of all coronaviruses sequenced to date (stated more broadly, the order Nidovirales) to determine if they contain non-canonical nucleic acid structures. We discovered much evidence of putative G-quadruplex sites and even much more of inverted repeats (IRs) loci, which in fact are ubiquitous along the whole genomic sequence and indicate a possible mechanism for genomic RNA packaging. The most notable enrichment of IRs was found inside 5'UTR for IRs of size 12+ nucleotides, and the most notable enrichment of putative quadruplex sites (PQSs) was located before 3'UTR, inside 5'UTR, and before mRNA. This indicates crucial regulatory roles for both IRs and PQSs. Moreover, we found multiple G-quadruplex binding motifs in human proteins having potential for binding of SARS-CoV-2 RNA. Non-canonical nucleic acids structures in Nidovirales and in novel SARS-CoV-2 are therefore promising druggable structures that can be targeted and utilized in the future.
Collapse
|