Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Herzel H, Ebeling W, Schmitt AO. Entropies of biosequences: The role of repeats. ACTA ACUST UNITED AC 1994;50:5061-5071. [PMID: 9962591 DOI: 10.1103/physreve.50.5061] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

For:	Herzel H, Ebeling W, Schmitt AO. Entropies of biosequences: The role of repeats. ACTA ACUST UNITED AC 1994;50:5061-5071. [PMID: 9962591 DOI: 10.1103/physreve.50.5061] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Number

Cited by Other Article(s)

Markić I, Štula M, Zorić M, Stipaničev D. Entropy-Based Approach in Selection Exact String-Matching Algorithms. ENTROPY (BASEL, SWITZERLAND) 2020;23:E31. [PMID: 33379282 PMCID: PMC7824336 DOI: 10.3390/e23010031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Revised: 12/19/2020] [Accepted: 12/22/2020] [Indexed: 11/16/2022]

Abstract

The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns' entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology.

Collapse

Humphrey S, Kerr A, Rattray M, Dive C, Miller CJ. A model of k-mer surprisal to quantify local sequence information content surrounding splice regions. PeerJ 2020;8:e10063. [PMID: 33194378 PMCID: PMC7648452 DOI: 10.7717/peerj.10063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 09/08/2020] [Indexed: 12/22/2022] Open

Abstract

Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.

Collapse

Popa O, Oldenburg E, Ebenhöh O. From sequence to information. Philos Trans R Soc Lond B Biol Sci 2020;375:20190448. [PMID: 33131436 PMCID: PMC7662195 DOI: 10.1098/rstb.2019.0448] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Náprstek J, Fischer C. Maximum Entropy Probability Density Principle in Probabilistic Investigations of Dynamic Systems. ENTROPY (BASEL, SWITZERLAND) 2018;20:e20100790. [PMID: 33265878 PMCID: PMC7512353 DOI: 10.3390/e20100790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/11/2018] [Accepted: 10/11/2018] [Indexed: 06/12/2023]

Corso G, Prado TDL, Lima GZDS, Kurths J, Lopes SR. Quantifying entropy using recurrence matrix microstates. CHAOS (WOODBURY, N.Y.) 2018;28:083108. [PMID: 30180629 DOI: 10.1063/1.5042026] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 07/16/2018] [Indexed: 05/28/2023]

Pizzi C, Ornamenti M, Spangaro S, Rombo SE, Parida L. Efficient Algorithms for Sequence Analysis with Entropic Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:117-128. [PMID: 28113780 DOI: 10.1109/tcbb.2016.2620143] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Clustering of giant virus-DNA based on variations in local entropy. Viruses 2014;6:2259-67. [PMID: 24887142 PMCID: PMC4074927 DOI: 10.3390/v6062259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 05/19/2014] [Accepted: 05/21/2014] [Indexed: 11/17/2022] Open

Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences. ACTA ACUST UNITED AC 2014. [DOI: 10.1007/978-3-662-44753-6_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

On the fractal geometry of DNA by the binary image analysis. Bull Math Biol 2013;75:1544-70. [PMID: 23760660 DOI: 10.1007/s11538-013-9859-9] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 05/21/2013] [Indexed: 12/15/2022]

Royer L, Reimann M, Stewart AF, Schroeder M. Network compression as a quality measure for protein interaction networks. PLoS One 2012;7:e35729. [PMID: 22719828 PMCID: PMC3377704 DOI: 10.1371/journal.pone.0035729] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2011] [Accepted: 03/24/2012] [Indexed: 11/18/2022] Open

Bielińska-Wąż D. Graphical and numerical representations of DNA sequences: statistical aspects of similarity. JOURNAL OF MATHEMATICAL CHEMISTRY 2011;49:2345. [PMID: 32214591 PMCID: PMC7087963 DOI: 10.1007/s10910-011-9890-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Accepted: 07/22/2011] [Indexed: 05/10/2023]

Bose R, Chouhan S. Alternate measure of information useful for DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011;83:051918. [PMID: 21728582 DOI: 10.1103/physreve.83.051918] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2010] [Revised: 03/07/2011] [Indexed: 05/31/2023]

Lesne A, Blanc JL, Pezard L. Entropy estimation of very short symbolic sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009;79:046208. [PMID: 19518313 DOI: 10.1103/physreve.79.046208] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Indexed: 05/27/2023]

Vinga S, Almeida JS. Local Renyi entropic profiles of DNA sequences. BMC Bioinformatics 2007;8:393. [PMID: 17939871 PMCID: PMC2238722 DOI: 10.1186/1471-2105-8-393] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 10/16/2007] [Indexed: 11/18/2022] Open

Vaillant C, Audit B, Thermes C, Arnéodo A. Formation and positioning of nucleosomes: effect of sequence-dependent long-range correlated structural disorder. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2006;19:263-77. [PMID: 16477390 DOI: 10.1140/epje/i2005-10053-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2005] [Accepted: 01/20/2006] [Indexed: 05/06/2023]

Larsabal E, Danchin A. Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns". BMC Bioinformatics 2005;6:206. [PMID: 16120222 PMCID: PMC1242344 DOI: 10.1186/1471-2105-6-206] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2005] [Accepted: 08/24/2005] [Indexed: 11/17/2022] Open

Dehnert M, Helm WE, Hütt MT. Information theory reveals large-scale synchronisation of statistical correlations in eukaryote genomes. Gene 2005;345:81-90. [PMID: 15716116 DOI: 10.1016/j.gene.2004.11.026] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 10/18/2004] [Accepted: 11/09/2004] [Indexed: 11/20/2022]

Vinga S, Almeida JS. Rényi continuous entropy of DNA sequences. J Theor Biol 2004;231:377-88. [PMID: 15501469 DOI: 10.1016/j.jtbi.2004.06.030] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2004] [Accepted: 06/30/2004] [Indexed: 11/20/2022]

Audit B, Vaillant C, Arnéodo A, d'Aubenton-Carafa Y, Thermes C. Wavelet Analysis of DNA Bending Profiles reveals Structural Constraints on the Evolution of Genomic Sequences. J Biol Phys 2004;30:33-81. [PMID: 23345861 PMCID: PMC3456503 DOI: 10.1023/b:jobp.0000016438.86794.8e] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime (≲ 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2].

Collapse

Nikolaou C, Almirantis Y. Mutually symmetric and complementary triplets: differences in their use distinguish systematically between coding and non-coding genomic sequences. J Theor Biol 2003;223:477-87. [PMID: 12875825 DOI: 10.1016/s0022-5193(03)00123-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Fukushima A, Ikemura T, Kinouchi M, Oshima T, Kudo Y, Mori H, Kanaya S. Periodicity in prokaryotic and eukaryotic genomes identified by power spectrum analysis. Gene 2002;300:203-11. [PMID: 12468102 DOI: 10.1016/s0378-1119(02)00850-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Anh VV, Lau KS, Yu ZG. Recognition of an organism from fragments of its complete genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002;66:031910. [PMID: 12366155 DOI: 10.1103/physreve.66.031910] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2002] [Revised: 06/13/2002] [Indexed: 05/23/2023]

Nikolaou C, Almirantis Y. A study of the middle-scale nucleotide clustering in DNA sequences of various origin and functionality, by means of a method based on a modified standard deviation. J Theor Biol 2002;217:479-92. [PMID: 12234754 DOI: 10.1006/jtbi.2002.3045] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Holste D, Grosse I, Herzel H. Statistical analysis of the DNA sequence of human chromosome 22. PHYSICAL REVIEW E 2001;64:041917. [PMID: 11690062 DOI: 10.1103/physreve.64.041917] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2001] [Indexed: 11/07/2022]

Yu ZG, Anh V, Lau KS. Measure representation and multifractal analysis of complete genomes. PHYSICAL REVIEW E 2001;64:031903. [PMID: 11580363 DOI: 10.1103/physreve.64.031903] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2000] [Revised: 05/01/2001] [Indexed: 11/07/2022]

Yu ZG, Anh VV, Wang B. Correlation property of length sequences based on global structure of the complete genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2001;63:011903. [PMID: 11304283 DOI: 10.1103/physreve.63.011903] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2000] [Revised: 08/28/2000] [Indexed: 05/23/2023]

Weiss O, Jiménez-Montaño MA, Herzel H. Information content of protein sequences. J Theor Biol 2000;206:379-86. [PMID: 10988023 DOI: 10.1006/jtbi.2000.2138] [Citation(s) in RCA: 69] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Lobzin VV, Chechetkin VR. Order and correlations in genomic DNA sequences. The spectral approach. ACTA ACUST UNITED AC 2000. [DOI: 10.3367/ufnr.0170.200001c.0057] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Wackerbauer R, Schmidt T. Symbolic dynamics of jejunal motility in the irritable bowel. CHAOS (WOODBURY, N.Y.) 1999;9:805-811. [PMID: 12779876 DOI: 10.1063/1.166454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Freund J, Ebeling W, Rateitschak K. Self-similar sequences and universal scaling of dynamical entropies. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1996;54:5561-5566. [PMID: 9965741 DOI: 10.1103/physreve.54.5561] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Allegrini P, Barbi M, Grigolini P, West BJ. Dynamical model for DNA sequences. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1995;52:5281-5296. [PMID: 9964027 DOI: 10.1103/physreve.52.5281] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Mantegna RN, Buldyrev SV, Goldberger AL, Havlin S, Peng CK, Simons M, Stanley HE. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1995;52:2939-50. [PMID: 9963739 DOI: 10.1103/physreve.52.2939] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]