1
|
Bartas M, Brázda V, Pečinka P. Special Issue "Bioinformatics of Unusual DNA and RNA Structures". Int J Mol Sci 2024; 25:5226. [PMID: 38791265 PMCID: PMC11121459 DOI: 10.3390/ijms25105226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 04/29/2024] [Accepted: 05/06/2024] [Indexed: 05/26/2024] Open
Abstract
Nucleic acids are not only static carriers of genetic information but also play vital roles in controlling cellular lifecycles through their fascinating structural diversity [...].
Collapse
Affiliation(s)
- Martin Bartas
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic;
| | - Václav Brázda
- Institute of Biophysics, Czech Academy of Sciences, Královopolská 135, 612 00 Brno, Czech Republic;
| | - Petr Pečinka
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic;
| |
Collapse
|
2
|
Bachurin SS, Yurushkin MV, Slynko IA, Kletskii ME, Burov ON, Berezovskiy DP. Structural peculiarities of tandem repeats and their clinical significance. Biochem Biophys Res Commun 2024; 692:149349. [PMID: 38056160 DOI: 10.1016/j.bbrc.2023.149349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 11/27/2023] [Indexed: 12/08/2023]
Abstract
While it is well established that a mere 2% of human DNA nucleotides are involved in protein coding, the remainder of the DNA plays a vital role in the preservation of normal cellular genetic function. A significant proportion of tandem repeats (TRs) are present in non-coding DNA. TRs - specific sequences of nucleotides that entail numerous repetitions of a given fragment. In this study, we employed our novel algorithm grounded in finite automata theory, which we refer to as Dafna, to investigate for the first time the likelihood of these nucleotide sequences forming non-canonical DNA structures (NS). Such structures include G-quadruplexes, i-motifs, hairpins, and triplexes. The tandem repeats under consideration in our research encompassed sequences containing 1 to 6 nucleotides per repeated fragment. For comparison, we employed a set of randomly generated sequences of the same length (60 nucleotides) as a benchmark. The outcomes of our research exposed a disparity between the potential for NS formation in random sequences and tandem repeats. Our findings affirm that the propensity of DNA and RNA to form NS is closely tied to various genetic disorders, including Huntington's disease, Fragile X syndrome, and Friedreich's ataxia. In the concluding discussion, we present a proposal for a new therapeutic mechanism to address these diseases. This novel approach revolves around the ability of specific nucleic acid fragments to form multiple types of NS.
Collapse
Affiliation(s)
- Stanislav S Bachurin
- Department of General and Clinical Biochemistry N2, Rostov State Medical University, 29 Nakhichevanskiy Lane, Rostov-on-Don, 344022, Russian Federation; LambasLab, Bar Rav Hai David 30, Haifa, 3559203, Israel.
| | | | - Ilya A Slynko
- LambasLab, Bar Rav Hai David 30, Haifa, 3559203, Israel
| | - Mikhail E Kletskii
- Department of Chemistry, Southern Federal University, 7 Zorge Str., Rostov-on-Don, 344090, Russian Federation
| | - Oleg N Burov
- Department of Chemistry, Southern Federal University, 7 Zorge Str., Rostov-on-Don, 344090, Russian Federation
| | - Dmitriy P Berezovskiy
- Department of Forensic Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Build. 4, 2 Bolshaya Pirogovskaya Str., Moscow, 119435, Russian Federation
| |
Collapse
|
3
|
Qian SH, Shi MW, Xiong YL, Zhang Y, Zhang ZH, Song XM, Deng XY, Chen ZX. EndoQuad: a comprehensive genome-wide experimentally validated endogenous G-quadruplex database. Nucleic Acids Res 2024; 52:D72-D80. [PMID: 37904589 PMCID: PMC10767823 DOI: 10.1093/nar/gkad966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/22/2023] [Accepted: 10/14/2023] [Indexed: 11/01/2023] Open
Abstract
G-quadruplexes (G4s) are non-canonical four-stranded structures and are emerging as novel genetic regulatory elements. However, a comprehensive genomic annotation of endogenous G4s (eG4s) and systematic characterization of their regulatory network are still lacking, posing major challenges for eG4 research. Here, we present EndoQuad (https://EndoQuad.chenzxlab.cn/) to address these pressing issues by integrating high-throughput experimental data. First, based on high-quality genome-wide eG4s mapping datasets (human: 1181; mouse: 24; chicken: 2) generated by G4 ChIP-seq/CUT&Tag, we generate a reference set of genome-wide eG4s. Our multi-omics analyses show that most eG4s are identified in one or a few cell types. The eG4s with higher occurrences across samples are more structurally stable, evolutionarily conserved, enriched in promoter regions, mark highly expressed genes and associate with complex regulatory programs, demonstrating higher confidence level for further experiments. Finally, we integrate millions of functional genomic variants and prioritize eG4s with regulatory functions in disease and cancer contexts. These efforts have culminated in the comprehensive and interactive database of experimentally validated DNA eG4s. As such, EndoQuad enables users to easily access, download and repurpose these data for their own research. EndoQuad will become a one-stop resource for eG4 research and lay the foundation for future functional studies.
Collapse
Affiliation(s)
- Sheng Hu Qian
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Meng-Wei Shi
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Yu-Li Xiong
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Yuan Zhang
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Ze-Hao Zhang
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Xue-Mei Song
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Xin-Yin Deng
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Zhen-Xia Chen
- Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China
| |
Collapse
|
4
|
Hamed BA, Ibrahim OAS, Abd El-Hafeez T. Optimizing classification efficiency with machine learning techniques for pattern matching. JOURNAL OF BIG DATA 2023; 10:124. [DOI: 10.1186/s40537-023-00804-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/16/2023] [Indexed: 09/02/2023]
Abstract
AbstractThe study proposes a novel model for DNA sequence classification that combines machine learning methods and a pattern-matching algorithm. This model aims to effectively categorize DNA sequences based on their features and enhance the accuracy and efficiency of DNA sequence classification. The performance of the proposed model is evaluated using various machine learning algorithms, and the results indicate that the SVM linear classifier achieves the highest accuracy and F1 score among the tested algorithms. This finding suggests that the proposed model can provide better overall performance than other algorithms in DNA sequence classification. In addition, the proposed model is compared to two suggested algorithms, namely FLPM and PAPM, and the results show that the proposed model outperforms these algorithms in terms of accuracy and efficiency. The study further explores the impact of pattern length on the accuracy and time complexity of each algorithm. The results show that as the pattern length increases, the execution time of each algorithm varies. For a pattern length of 5, SVM Linear and EFLPM have the lowest execution time of 0.0035 s. However, at a pattern length of 25, SVM Linear has the lowest execution time of 0.0012 s. The experimental results of the proposed model show that SVM Linear has the highest accuracy and F1 score among the tested algorithms. SVM Linear achieved an accuracy of 0.963 and an F1 score of 0.97, indicating that it can provide the best overall performance in DNA sequence classification. Naive Bayes also performs well with an accuracy of 0.838 and an F1 score of 0.94. The proposed model offers a valuable contribution to the field of DNA sequence analysis by providing a novel approach to pre-processing and feature extraction. The model’s potential applications include drug discovery, personalized medicine, and disease diagnosis. The study’s findings highlight the importance of considering the impact of pattern length on the accuracy and time complexity of DNA sequence classification algorithms.
Collapse
|