1
|
Ghosh S, Pal J, Maji B, Cattani C, Bhattacharya DK. Choice of Metric Divergence in Genome Sequence Comparison. Protein J 2024; 43:259-273. [PMID: 38492188 DOI: 10.1007/s10930-024-10189-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2024] [Indexed: 03/18/2024]
Abstract
The paper introduces a novel probability descriptor for genome sequence comparison, employing a generalized form of Jensen-Shannon divergence. This divergence metric stems from a one-parameter family, comprising fractions up to a maximum value of half. Utilizing this metric as a distance measure, a distance matrix is computed for the new probability descriptor, shaping Phylogenetic trees via the neighbor-joining method. Initial exploration involves setting the parameter at half for various species. Assessing the impact of parameter variation, trees drawn at different parameter values (half, one-fourth, one-eighth). However, measurement scales decrease with parameter value increments, with higher similarity accuracy corresponding to lower scale values. Ultimately, the highest accuracy aligns with the maximum parameter value of half. Comparative analyses against previous methods, evaluating via Symmetric Distance (SD) values and rationalized perception, consistently favor the present approach's results. Notably, outcomes at the maximum parameter value exhibit the most accuracy, validating the method's efficacy against earlier approaches.
Collapse
Affiliation(s)
- Soumen Ghosh
- Information Technology, Narula Institute of Technology, Kolkata, West Bengal, India.
| | - Jayanta Pal
- Computer Science & Engineering, Narula Institute of Technology, Kolkata, West Bengal, India
| | - Bansibadan Maji
- Electronics & Communication Engineering, National Institute of Technology, Durgapur, West Bengal, India
| | - Carlo Cattani
- DEIM, University of Tuscia, Largo Dell'Universita, 01100, Viterbo, Italy
| | | |
Collapse
|
2
|
Bastos CAC, Afreixo V, Rodrigues JMOS, Pinho AJ. Concentration of inverted repeats along human DNA. J Integr Bioinform 2023; 20:jib-2022-0052. [PMID: 37486620 PMCID: PMC10561070 DOI: 10.1515/jib-2022-0052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 02/27/2023] [Indexed: 07/25/2023] Open
Abstract
This work aims to describe the observed enrichment of inverted repeats in the human genome; and to identify and describe, with detailed length profiles, the regions with significant and relevant enriched occurrence of inverted repeats. The enrichment is assessed and tested with a recently proposed measure (z-scores based measure). We simulate a genome using an order 7 Markov model trained with the data from the real genome. The simulated genome is used to establish the critical values which are used as decision thresholds to identify the regions with significant enriched concentrations. Several human genome regions are highly enriched in the occurrence of inverted repeats. This is observed in all the human chromosomes. The distribution of inverted repeat lengths varies along the genome. The majority of the regions with severely exaggerated enrichment contain mainly short length inverted repeats. There are also regions with regular peaks along the inverted repeats lengths distribution (periodic regularities) and other regions with exaggerated enrichment for long lengths (less frequent). However, adjacent regions tend to have similar distributions.
Collapse
Affiliation(s)
- Carlos A. C. Bastos
- DETI – Department of Electronics, Telecommunications and Informatics, IEETA – Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193Aveiro, Portugal
- LASI – Intelligent Systems Associate Laboratory, Aveiro, Portugal
| | - Vera Afreixo
- CIDMA – Center for Research and Development in Mathematics and Applications, DMAT – Department of Mathematics, University of Aveiro, 3810-193Aveiro, Portugal
| | - João M. O. S. Rodrigues
- DETI – Department of Electronics, Telecommunications and Informatics, IEETA – Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193Aveiro, Portugal
- LASI – Intelligent Systems Associate Laboratory, Aveiro, Portugal
| | - Armando J. Pinho
- DETI – Department of Electronics, Telecommunications and Informatics, IEETA – Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193Aveiro, Portugal
- LASI – Intelligent Systems Associate Laboratory, Aveiro, Portugal
| |
Collapse
|
3
|
Silva JM, Pratas D, Caetano T, Matos S. Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1007/978-3-031-04881-4_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
Tan M, Takahashi N, Fujii S, Sakurai K, Kusamori K, Takahashi Y, Takakura Y, Nishikawa M. Analysis of Tertiary Structural Features of Branched DNA Nanostructures with Partially Common Sequences Using Small-Angle X-ray Scattering. ACS APPLIED BIO MATERIALS 2019; 3:308-314. [DOI: 10.1021/acsabm.9b00829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Mengmeng Tan
- Department of Biopharmaceutics and Drug Metabolism, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29, Yoshidashimoadachi-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Natsuki Takahashi
- Department of Biopharmaceutics and Drug Metabolism, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29, Yoshidashimoadachi-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Shota Fujii
- Department of Chemistry and Biochemistry, University of Kitakyushu, 1-1 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka 808-0135, Japan
| | - Kazuo Sakurai
- Department of Chemistry and Biochemistry, University of Kitakyushu, 1-1 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka 808-0135, Japan
- Structural Materials Science Laboratory SPring-8 Center, RIKEN Harima Institute Research, 1-1-1 Kouto, Sayo-cho, Sayo, Hyogo 679-5148, Japan
| | - Kosuke Kusamori
- Laboratory of Biopharmaceutics, Faculty of Pharmaceutical Sciences, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba 278-8510, Japan
| | - Yuki Takahashi
- Department of Biopharmaceutics and Drug Metabolism, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29, Yoshidashimoadachi-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Yoshinobu Takakura
- Department of Biopharmaceutics and Drug Metabolism, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29, Yoshidashimoadachi-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Makiya Nishikawa
- Department of Biopharmaceutics and Drug Metabolism, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29, Yoshidashimoadachi-cho, Sakyo-ku, Kyoto 606-8501, Japan
- Laboratory of Biopharmaceutics, Faculty of Pharmaceutical Sciences, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba 278-8510, Japan
| |
Collapse
|
5
|
Clustering genomic words in human DNA using peaks and trends of distributions. ADV DATA ANAL CLASSI 2019. [DOI: 10.1007/s11634-019-00362-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
6
|
Bastos CAC, Afreixo V, Rodrigues JMOS, Pinho AJ, Silva RM. Distribution of Distances Between Symmetric Words in the Human Genome: Analysis of Regular Peaks. Interdiscip Sci 2019; 11:367-372. [PMID: 30911903 DOI: 10.1007/s12539-019-00326-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 01/24/2019] [Accepted: 02/27/2019] [Indexed: 11/29/2022]
Abstract
Finding DNA sites with high potential for the formation of hairpin/cruciform structures is an important task. Previous works studied the distances between adjacent reversed complement words (symmetric word pairs) and also for non-adjacent words. It was observed that for some words a few distances were favoured (peaks) and that in some distributions there was strong peak regularity. The present work extends previous studies, by improving the detection and characterization of peak regularities in the symmetric word pairs distance distributions of the human genome. This work also analyzes the location of the sequences that originate the observed strong peak periodicity in the distance distribution. The results obtained in this work may indicate genomic sites with potential for the formation of hairpin/cruciform structures.
Collapse
Affiliation(s)
- Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - Vera Afreixo
- Department of Mathematics, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, CIDMA-Center for Research and Development in Mathematics and Applications, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| | - Armando J Pinho
- Department of Electronics, Telecommunications and Informatics, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| | - Raquel M Silva
- Department of Medical Sciences, iBiMED, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| |
Collapse
|
7
|
Cristadoro G, Degli Esposti M, Altmann EG. The common origin of symmetry and structure in genetic sequences. Sci Rep 2018; 8:15817. [PMID: 30361485 PMCID: PMC6202410 DOI: 10.1038/s41598-018-34136-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 10/09/2018] [Indexed: 12/20/2022] Open
Abstract
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
Collapse
Affiliation(s)
- Giampaolo Cristadoro
- Dipartimento di Matematica e Applicazioni, Università di Milano-Bicocca, 20125, Milano, Italy.
| | | | - Eduardo G Altmann
- School of Mathematics and Statistics, University of Sydney, Sydney, 2006, NSW, Australia
| |
Collapse
|