1
|
Chaley M, Kutyrkin V. Stochastic models for description of structural-statistical properties in DNA sequences. J Theor Biol 2019; 496:110126. [PMID: 31866393 DOI: 10.1016/j.jtbi.2019.110126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 12/02/2019] [Accepted: 12/18/2019] [Indexed: 10/25/2022]
Abstract
New stochastic models based on a notion of stochastic codon are proposed. These models, presented by special random strings, describe practical structural-statistical properties which are peculiar to coding DNA both from prokaryotic and eukaryotic genomes. In such the case coding regions are considered as the realizations of random strings. The models introduced explain existence of latent profile periodicity with a period which is not only equal to but also multiplied of three in the coding regions. For the sequences with latent profile period multiplied of three, but not equal to three, the proposed models ensure existence of special property of 3-regularity in these sequences which is practically recognized in all coding sequences of the genomes analyzed. Feasibility of the stochastic models proposed was tested in numerical experiments with binary reencoded paragraphs of literary texts (in English and Italian languages), used as analog of DNA coding regions.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology RAS - Branch of Keldysh Institute of Applied Mathematics RAS, Professor Vitkevich St.,1, 142290 Pushchino, Russia.
| | - Vladimir Kutyrkin
- Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st.,5, 105005 Moscow, Russia.
| |
Collapse
|
2
|
Farnoud F, Schwartz M, Bruck J. Estimation of duplication history under a stochastic model for tandem repeats. BMC Bioinformatics 2019; 20:64. [PMID: 30727948 PMCID: PMC6364452 DOI: 10.1186/s12859-019-2603-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms. RESULTS We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats. CONCLUSION The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings. AVAILABILITY The implementation of the estimation method is available at http://ips.lab.virginia.edu/smtr .
Collapse
Affiliation(s)
- Farzad Farnoud
- Department of Electrical and Computer Engineering, Department of Computer Science, University of Virginia, Charlottesville, USA
| | - Moshe Schwartz
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Jehoshua Bruck
- Department of Electrical Engineering, California Institute of Technology, Pasadena, USA
| |
Collapse
|
3
|
Chaley M, Kutyrkin V. Stochastic model of homogeneous coding and latent periodicity in DNA sequences. J Theor Biol 2016; 390:106-16. [DOI: 10.1016/j.jtbi.2015.11.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 09/18/2015] [Accepted: 11/14/2015] [Indexed: 11/24/2022]
|
4
|
Chaley M, Kutyrkin V. Spectral-Statistical Approach for Revealing Latent Regular Structures in DNA Sequence. Methods Mol Biol 2016; 1415:315-340. [PMID: 27115640 DOI: 10.1007/978-1-4939-3572-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral-statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st., 4, 142290, Pushchino, Russia.
| | - Vladimir Kutyrkin
- Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University, n.a. N.E. Bauman the 2nd Baumanskaya st., 5, 105005, Moscow, Russia
| |
Collapse
|
5
|
Suvorova YM, Korotkova MA, Korotkov EV. Comparative analysis of periodicity search methods in DNA sequences. Comput Biol Chem 2014; 53 Pt A:43-8. [PMID: 25218218 DOI: 10.1016/j.compbiolchem.2014.08.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/30/2022]
Abstract
To determine the periodicity of a DNA sequence, different spectral approaches are applied (discrete Fourier transform (DFT), autocorrelation (CORR), information decomposition (ID), hybrid method (HYB), concept of spectral envelope for spectral analysis (SE), normalized autocorrelation (CORR_N) and profile analysis (PA). In this work, we investigated the possibility of finding the true period length, by depending on the average number of accumulated changes in DNA bases (PM) for the methods stated above. The results show that for periods with short length (≤4 b.p), it is possible to use the hybrid method (HYB), which combines properties of autocorrelation, Fourier transform, and information decomposition (ID). For larger period lengths (>4) with values of point mutation (PM) equal to 1.0 or more per one nucleotide, it is preferable to use information of decomposition method (ID), as the other spectral approaches cannot achieve correct determination of the period length present in the analyzed sequence.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Centre of Bioengineering Russian Academy of Sciences, Prospect 60-tya Oktyabrya 7/1, Moscow 117312, Russian Federation.
| | - Maria A Korotkova
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe Shosse, 31, Moscow 115522, Russian Federation.
| | - Eugene V Korotkov
- Centre of Bioengineering Russian Academy of Sciences, Prospect 60-tya Oktyabrya 7/1, Moscow 117312, Russian Federation; National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe Shosse, 31, Moscow 115522, Russian Federation.
| |
Collapse
|
6
|
Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N. HeteroGenome: database of genome periodicity. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau040. [PMID: 24857969 PMCID: PMC4038257 DOI: 10.1093/database/bau040] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome. Database URL:http://www.jcbi.ru/lp_baze/
Collapse
Affiliation(s)
- Maria Chaley
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Vladimir Kutyrkin
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Gayane Tulbasheva
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Elena Teplukhina
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Nafisa Nazipova
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| |
Collapse
|
7
|
Valenzuela CY. The structure of selective dinucleotide interactions and periodicities in D melanogaster mtDNA. Biol Res 2014; 47:18. [PMID: 25027717 PMCID: PMC4101722 DOI: 10.1186/0717-6287-47-18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 04/26/2014] [Indexed: 10/28/2022] Open
Abstract
BACKGROUND We found a strong selective 3-sites periodicity of deviations from randomness of the dinucleotide (DN) distribution, where both bases of DN were separated by 1, 2, K sites in prokaryotes and mtDNA. Three main aspects are studied. I) the specific 3 K-sites periodic structure of the 16 DN. II) to discard the possibility that the periodicity was produced by the highly nonrandom interactive association of contiguous bases, by studying the interaction of non-contiguous bases, the first one chosen each I sites and the second chosen J sites downstream. III) the difference between this selective periodicity of association (distance to randomness) of the four bases with the described fixed periodicities of base sequences. RESULTS I) The 16 pairs presented a consistent periodicity in the strength of association of both bases of the pairs; the most deviated pairs are those where G and C are involved and the least deviated ones are those where A and T are involved. II) we found significant non-random interactions when the first nucleotide is chosen every I sites and the second J sites downstream until I=J=76. III) we showed conclusive differences between these internucleotide association periodicities and sequence periodicities. CONCLUSIONS This relational selective periodicity is different from sequence periodicities and indicates that any base strongly interacts with the bases of the residual genome; this interaction and periodicity is highly structured and systematic for every pair of bases. This interaction should be destroyed in few generations by recurrent mutation; it is only compatible with the Synthetic Theory of Evolution and agrees with the Wright's adaptive landscape conception and evolution by shifting balanced adaptive peaks.
Collapse
|