Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Messer PW, Arndt PF. CorGen--measuring and generating long-range correlations for DNA sequence analysis. Nucleic Acids Res 2006;34:W692-5. [PMID: 16845099 PMCID: PMC1538783 DOI: 10.1093/nar/gkl234] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Orlov YL, Orlova NG. Bioinformatics tools for the sequence complexity estimates. Biophys Rev 2023;15:1367-1378. [PMID: 37974990 PMCID: PMC10643780 DOI: 10.1007/s12551-023-01140-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 11/19/2023] Open

Abstract

We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.

Collapse

Saeed M. Fractal genomics of SOD1 evolution. Immunogenetics 2020;72:439-445. [PMID: 33237378 DOI: 10.1007/s00251-020-01184-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 10/28/2020] [Indexed: 10/22/2022]

Werner M, Fieth P, Hartmann A. Large-Deviation Properties of Sequence Alignment of Correlated Sequences. J Comput Biol 2018;25:1339-1346. [PMID: 30204481 DOI: 10.1089/cmb.2017.0269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Colliva A, Pellegrini R, Testori A, Caselle M. Ising-model description of long-range correlations in DNA sequences. Phys Rev E Stat Nonlin Soft Matter Phys 2015;91:052703. [PMID: 26066195 DOI: 10.1103/physreve.91.052703] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Indexed: 06/04/2023]

Paraskevopoulou MD, Vlachos IS, Athanasiadis E, Spyrou G. BiDaS: a web-based Monte Carlo BioData Simulator based on sequence/feature characteristics. Nucleic Acids Res 2013;41:W582-6. [PMID: 23716644 PMCID: PMC3692108 DOI: 10.1093/nar/gkt420] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Massip F, Arndt PF. Neutral evolution of duplicated DNA: an evolutionary stick-breaking process causes scale-invariant behavior. Phys Rev Lett 2013;110:148101. [PMID: 25167038 DOI: 10.1103/physrevlett.110.148101] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Indexed: 06/03/2023]

Pandit A, Dasanna AK, Sinha S. Multifractal analysis of HIV-1 genomes. Mol Phylogenet Evol 2011;62:756-63. [PMID: 22155711 DOI: 10.1016/j.ympev.2011.11.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Revised: 10/29/2011] [Accepted: 11/18/2011] [Indexed: 10/14/2022]

Koroteev MV, Miller J. Scale-free duplication dynamics: a model for ultraduplication. Phys Rev E Stat Nonlin Soft Matter Phys 2011;84:061919. [PMID: 22304128 DOI: 10.1103/physreve.84.061919] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 07/04/2011] [Indexed: 05/31/2023]

Provata A, Beck C. Multifractal analysis of nonhyperbolic coupled map lattices: application to genomic sequences. Phys Rev E Stat Nonlin Soft Matter Phys 2011;83:066210. [PMID: 21797464 DOI: 10.1103/physreve.83.066210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Indexed: 05/31/2023]

Hall P, Jin J. Innovated higher criticism for detecting sparse signals in correlated noise. Ann Stat 2010. [DOI: 10.1214/09-aos764] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Hall P, Wang Q. Strong approximations of level exceedences related to multiple hypothesis testing. BERNOULLI 2010. [DOI: 10.3150/09-bej220] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Hall P, Pham T. Optimal properties of centroid-based classifiers for very high-dimensional data. Ann Stat 2010. [DOI: 10.1214/09-aos736] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Provata A, Katsaloulis P. Hierarchical multifractal representation of symbolic sequences and application to human chromosomes. Phys Rev E Stat Nonlin Soft Matter Phys 2010;81:026102. [PMID: 20365626 DOI: 10.1103/physreve.81.026102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Indexed: 05/29/2023]

Evans KJ. Genomic DNA from animals shows contrasting strand bias in large and small subsequences. BMC Genomics 2008;9:43. [PMID: 18221531 PMCID: PMC2267173 DOI: 10.1186/1471-2164-9-43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Accepted: 01/25/2008] [Indexed: 01/09/2023] Open

Abstract

Background

For eukaryotes, there is almost no strand bias with regard to base composition, with exceptions for origins of replication and transcription start sites and transcribed regions. This paper revisits the question for subsequences of DNA taken at random from the genome.

Results

For a typical mammal, for example mouse or human, there is a small strand bias throughout the genomic DNA: there is a correlation between (G - C) and (A - T) on the same strand, (that is between the difference in the number of guanine and cytosine bases and the difference in the number of adenine and thymine bases). For small subsequences – up to 1 kb – this correlation is weak but positive; but for large windows – around 50 kb to 2 Mb – the correlation is strong and negative. This effect is largely independent of GC%. Transcribed and untranscribed regions give similar correlations both for small and large subsequences, but there is a difference in these regions for intermediate sized subsequences. An analysis of the human genome showed that position within the isochore structure did not affect these correlations. An analysis of available genomes of different species shows that this contrast between large and small windows is a general feature of mammals and birds. Further down the evolutionary tree, other organisms show a similar but smaller effect. Except for the nematode, all the animals analysed showed at least a small effect.

Conclusion

The correlations on the large scale may be explained by DNA replication. Transcription may be a modifier of these effects but is not the fundamental cause. These results cast light on how DNA mutations affect the genome over evolutionary time. At least for vertebrates, there is a broad relationship between body temperature and the size of the correlation. The genome of mammals and birds has a structure marked by strand bias segments.

Collapse

Evans KJ. Strand bias structure in mouse DNA gives a glimpse of how chromatin structure affects gene expression. BMC Genomics 2008;9:16. [PMID: 18194530 PMCID: PMC2266913 DOI: 10.1186/1471-2164-9-16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Accepted: 01/14/2008] [Indexed: 12/20/2022] Open