1
|
Liu X, Teng L, Luo Y, Xu Y. Prediction of prokaryotic and eukaryotic promoters based on information-theoretic features. Biosystems 2023; 231:104979. [PMID: 37423595 DOI: 10.1016/j.biosystems.2023.104979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/11/2023]
Abstract
Promoters are DNA regulatory elements located near the transcription start site and are responsible for regulating the transcription of genes. DNA fragments arranged in a certain order form specific functional regions with different information contents. Information theory is the science that studies the extraction, measurement and transmission of information. The genetic information contained in DNA follows the general laws of information storage. Therefore, method in information theory can be used for the analysis of promoters carrying genetic information. In this study, we introduced the concept of information theory to the study of promoter prediction. We used 107 features extracted based on information theory methods and a backpropagation neural network to build a classifier. Then, the trained classifier was applied to predict the promoters of 6 organisms. The average AUCs of the 6 organisms obtained by using hold-out validation and ten-fold cross-validation were 0.885 and 0.886, respectively. The results verified the effectiveness of information-theoretic features in promoter prediction. Considering the possible redundancy in the feature set, we performed feature selection and obtained key feature subsets related to promoter characteristics. The results indicate the potential utility of information-theoretic features in promoter prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
| | - Li Teng
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| |
Collapse
|
2
|
Liu X, Luo Y, He T, Ren M, Xu Y. Predicting essential genes of 37 prokaryotes by combining information-theoretic features. J Microbiol Methods 2021; 188:106297. [PMID: 34343487 DOI: 10.1016/j.mimet.2021.106297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/30/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales. In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| |
Collapse
|
3
|
Liu X, Wang B, Xu L. Statistical Analysis of Hurst Exponents of Essential/Nonessential Genes in 33 Bacterial Genomes. PLoS One 2015; 10:e0129716. [PMID: 26067107 PMCID: PMC4466317 DOI: 10.1371/journal.pone.0129716] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 05/12/2015] [Indexed: 12/04/2022] Open
Abstract
Methods for identifying essential genes currently depend predominantly on biochemical experiments. However, there is demand for improved computational methods for determining gene essentiality. In this study, we used the Hurst exponent, a characteristic parameter to describe long-range correlation in DNA, and analyzed its distribution in 33 bacterial genomes. In most genomes (31 out of 33) the significance levels of the Hurst exponents of the essential genes were significantly higher than for the corresponding full-gene-set, whereas the significance levels of the Hurst exponents of the nonessential genes remained unchanged or increased only slightly. All of the Hurst exponents of essential genes followed a normal distribution, with one exception. We therefore propose that the distribution feature of Hurst exponents of essential genes can be used as a classification index for essential gene prediction in bacteria. For computer-aided design in the field of synthetic biology, this feature can build a restraint for pre- or post-design checking of bacterial essential genes. Moreover, considering the relationship between gene essentiality and evolution, the Hurst exponents could be used as a descriptive parameter related to evolutionary level, or be added to the annotation of each gene.
Collapse
Affiliation(s)
- Xiao Liu
- College of Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
- * E-mail:
| | - Baojin Wang
- College of Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Luo Xu
- College of Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| |
Collapse
|
4
|
Carbone A. Information measure for long-range correlated sequences: the case of the 24 human chromosomes. Sci Rep 2014; 3:2721. [PMID: 24056670 PMCID: PMC3779848 DOI: 10.1038/srep02721] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 09/04/2013] [Indexed: 01/14/2023] Open
Abstract
A new approach to estimate the Shannon entropy of a long-range correlated sequence is proposed. The entropy is written as the sum of two terms corresponding respectively to power-law (ordered) and exponentially (disordered) distributed blocks (clusters). The approach is illustrated on the 24 human chromosome sequences by taking the nucleotide composition as the relevant information to be encoded/decoded. Interestingly, the nucleotide composition of the ordered clusters is found, on the average, comparable to the one of the whole analyzed sequence, while that of the disordered clusters fluctuates. From the information theory standpoint, this means that the power-law correlated clusters carry the same information of the whole analysed sequence. Furthermore, the fluctuations of the nucleotide composition of the disordered clusters are linked to relevant biological properties, such as segmental duplications and gene density.
Collapse
Affiliation(s)
- A Carbone
- 1] Politecnico di Torino, Italy [2] ISC-CNR, Unità Università 'La Sapienza' di Roma, Italy [3] ETH Zurich, Switzerland
| |
Collapse
|
5
|
Liu X, Wang SY, Wang J. A statistical feature of Hurst exponents of essential genes in bacterial genomes. Integr Biol (Camb) 2011; 4:93-8. [PMID: 22108754 DOI: 10.1039/c1ib00030f] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
At present, methods for determining essential genes depend on biochemical experiments. There is therefore a demand for the development of analysis methods and software for identifying essential genes, based on the common features of these genes. In this study, we employed the Hurst exponent as a characteristic parameter and analyzed its distribution among nine bacterial species. We found that most of the significance levels of the Hurst exponents of essential genes were higher than those of the corresponding full-gene-set. Conversely, most of the significance levels of the Hurst exponents of nonessential genes remained unchanged or only increased slightly. Therefore, we propose that this feature represents a restraint for pre- or post-design checking of bacterial essential genes in computer-aided design.
Collapse
Affiliation(s)
- Xiao Liu
- College of Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
| | | | | |
Collapse
|
6
|
Michieli I, Medved B, Ristov S. Data series embedding and scale invariant statistics. Hum Mov Sci 2010; 29:449-63. [PMID: 20435364 DOI: 10.1016/j.humov.2009.08.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Revised: 06/12/2009] [Accepted: 08/26/2009] [Indexed: 11/18/2022]
Abstract
Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated.
Collapse
Affiliation(s)
- I Michieli
- Electronic Department, Ruder Bosković Institute, Zagreb 10000, Croatia.
| | | | | |
Collapse
|
7
|
Reconsidering the significance of genomic word frequencies. Trends Genet 2007; 23:543-6. [PMID: 17964682 DOI: 10.1016/j.tig.2007.07.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2007] [Revised: 06/26/2007] [Accepted: 07/09/2007] [Indexed: 11/22/2022]
Abstract
By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.
Collapse
|
8
|
Podtelezhnikov AA, Ghahramani Z, Wild DL. Learning about protein hydrogen bonding by minimizing contrastive divergence. Proteins 2007; 66:588-99. [PMID: 17109405 DOI: 10.1002/prot.21247] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Defining the strength and geometry of hydrogen bonds in protein structures has been a challenging task since early days of structural biology. In this article, we apply a novel statistical machine learning technique, known as contrastive divergence, to efficiently estimate both the hydrogen bond strength and the geometric characteristics of strong interpeptide backbone hydrogen bonds, from a dataset of structures representing a variety of different protein folds. Despite the simplifying assumptions of the interatomic energy terms used, we determine the strength of these hydrogen bonds to be between 1.1 and 1.5 kcal/mol, in good agreement with earlier experimental estimates. The geometry of these strong backbone hydrogen bonds features an almost linear arrangement of all four atoms involved in hydrogen bond formation. We estimate that about a quarter of all hydrogen bond donors and acceptors participate in these strong interpeptide hydrogen bonds.
Collapse
|
9
|
Li W, Holste D. Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:041910. [PMID: 15903704 DOI: 10.1103/physreve.71.041910] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 10/28/2004] [Indexed: 05/02/2023]
Abstract
Spatial fluctuations of guanine and cytosine base content (GC%) are studied by spectral analysis for the complete set of human genomic DNA sequences. We find that (i) 1/ f(alpha) decay is universally observed in the power spectra of all 24 chromosomes, and (ii) the exponent alpha approximately 1 extends to about 10(7) bases, one order of magnitude longer than has previously been observed. We further find that (iii) almost all human chromosomes exhibit a crossover from alpha(1) approximately 1 (1/ f (alpha(1))) at lower frequency to alpha(2) <1 (1/ f (alpha(2))) at higher frequency, typically occurring at around 30,000-100,000 bases, while (iv) the crossover in this frequency range is virtually absent in human chromosome 22. In addition to the universal 1/ f(alpha) noise in power spectra, we find (v) several lines of evidence for chromosome-specific correlation structures, including a 500,000 base long oscillation in human chromosome 21. The universal 1/ f(alpha) spectrum in the human genome is further substantiated by a resistance to reduction in variance of guanine and cytosine content when the window size is increased.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, New York 10030, USA.
| | | |
Collapse
|
10
|
Li W, Holste D. An unusual 500,000 bases long oscillation of guanine and cytosine content in human chromosome 21. Comput Biol Chem 2004; 28:393-9. [PMID: 15556480 DOI: 10.1016/j.compbiolchem.2004.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 01/09/2023]
Abstract
An oscillation with a period of around 500 kb in guanine and cytosine content (GC%) is observed in the DNA sequence of human chromosome 21. This oscillation is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb to 46.5 Mb. Five cycles of oscillation are observed in this region with six GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions with low density of CpG islands and, alternating between the two DNA strands, low gene density regions. Consequently, the long-range oscillation of GC% result in spacing patterns of both CpG island density, and to a lesser extent, gene densities.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA.
| | | |
Collapse
|
11
|
Jabbari K, Bernardi G. Comparative genomics of Anopheles gambiae and Drosophila melanogaster. Gene 2004; 333:183-6. [PMID: 15177694 DOI: 10.1016/j.gene.2004.02.038] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 02/10/2004] [Indexed: 10/26/2022]
Abstract
A sequence analysis of the genomes of Anopheles gambiae and Drosophila melanogaster reveals that Anopheles DNA is more heterogeneous and GC-richer than Drosophila DNA. The gene concentration across the Anopheles genome is characterized by low levels in the GC-poor part of the genome and a 3-fold increase in the GC-richest part; this gene density gradient is approximately half that of Drosophila. GC levels of introns and flanking sequences are correlated with GC(3) values (GC levels of third codon positions) of the corresponding genes with slopes much lower than unity; in other words, most introns and intergenic sequences are less GC-rich than the corresponding GC(3) values. These findings, which describe a compositional shift within Diptera, is of interest because of their parallels in the well studied major shift in vertebrates.
Collapse
Affiliation(s)
- Kamel Jabbari
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, F-75005 Paris, France
| | | |
Collapse
|
12
|
Bernaola-Galván P, Oliver JL, Carpena P, Clay O, Bernardi G. Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes. Gene 2004; 333:121-33. [PMID: 15177687 DOI: 10.1016/j.gene.2004.02.042] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2003] [Revised: 11/14/2003] [Accepted: 02/10/2004] [Indexed: 11/15/2022]
Abstract
The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedly heterogeneous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website: http://bioinfo2.ugr.es/prok.
Collapse
|
13
|
Schuck P. A model for sedimentation in inhomogeneous media. I. Dynamic density gradients from sedimenting co-solutes. Biophys Chem 2004; 108:187-200. [PMID: 15043929 DOI: 10.1016/j.bpc.2003.10.016] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Macromolecular sedimentation in inhomogeneous media is of great practical importance. Dynamic density gradients have a long tradition in analytical ultracentrifugation, and are frequently used in preparative ultracentrifugation. In this paper, a new theoretical model for sedimentation in inhomogeneous media is presented, based on finite element solutions of the Lamm equation with spatial and temporal variation of the local solvent density and viscosity. It is applied to macromolecular sedimentation in the presence of a dynamic density gradient formed by the sedimentation of a co-solute at high concentration. It is implemented in the software SEDFIT for the analysis of experimental macromolecular concentration distributions. The model agrees well with the measured sedimentation profiles of a protein in a dynamic cesium chloride gradient, and may provide a measure for the effects of hydration or preferential solvation parameters. General features of protein sedimentation in dynamic density gradients are described.
Collapse
Affiliation(s)
- Peter Schuck
- Division of Bioengineering and Physical Science, ORS, OD, National Institutes of Health, Building 13, Room 3N17, 13 South Drive, Bethesda, MD 20892-5766, USA.
| |
Collapse
|
14
|
Abstract
Three statistical/mathematical analyses are carried out on isochore sequences: spectral analysis, analysis of variance, and segmentation analysis. Spectral analysis shows that there are GC content fluctuations at different length scales in isochore sequences. The analysis of variance shows that the null hypothesis (the mean value of a group of GC contents remains the same along the sequence) may or may not be rejected for an isochore sequence, depending on the subwindow sizes at which GC contents are sampled, and the window size within which group members are defined. The segmentation analysis shows that there are stronger indications of GC content changes at isochore borders than within an isochore. These analyses support the notion of isochore sequences, but reject the assumption that isochore sequences are homogeneous at the base level. An isochore sequence may pass a homogeneity test when GC content fluctuations at smaller length scales are ignored or averaged out.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore - LIJ Research Institute, 350 Community Drive, Manhasset, NY 10030, USA.
| |
Collapse
|
15
|
Abstract
Here we present a study of statistical correlations among different positions in DNA sequences and their implications by directly using the autocorrelation function. Such an analysis is possible now because of the availability of large sequences or even complete genomes of many organisms. After describing the way in which the autocorrelation function can be applied to DNA-sequence analysis, we show that long-range correlations, implying scale independence, appear in several bacterial genomes as well as in long human chromosome contigs. The source for such correlations in bacteria, which may extend up to 60 kb in Bacillus subtilis, may be related to massive lateral transfer of compositionally biased genes from other genomes. In the human genome, correlations extend for more than five decades and may be related to the evolution of the 'neogenome', a modern evolutionary acquisition composed by GC-rich isochores displaying long-range correlations and scale invariance.
Collapse
Affiliation(s)
- P Bernaola-Galván
- Departamento de Física Aplicada II, E.T.S.I. de Telecomunicación, Universidad de Málaga, Málaga, Spain.
| | | | | | | |
Collapse
|
16
|
|
17
|
Clay O, Carels N, Douady C, Macaya G, Bernardi G. Compositional heterogeneity within and among isochores in mammalian genomes. I. CsCl and sequence analyses. Gene 2001; 276:15-24. [PMID: 11591467 DOI: 10.1016/s0378-1119(01)00667-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
GC level distributions of a species' nuclear genome, or of its compositional fractions, encode key information on structural and functional properties of the genome and on its evolution. They can be calculated either from absorbance profiles of the DNA in CsCl density gradients at sedimentation equilibrium, or by scanning long contigs of largely sequenced genomes. In the present study, we address the quantitative characterization of the compositional heterogeneity of genomes, as measured by the GC distributions of fixed-length fragments. Special attention is given to mammalian genomes, since their compartmentalization into isochores implies two levels of heterogeneity, intra-isochore (local) and inter-isochore (global). This partitioning is a natural one, since large-scale compositional properties vary much more among isochores than within them. Intra-isochore GC distributions become roughly Gaussian for long fragments, and their standard deviations decrease only slowly with increasing fragment length, unlike random sequences. This effect can be explained by 'long-range' correlations, often overlooked, that are present along isochores.
Collapse
Affiliation(s)
- O Clay
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | | | | | | | |
Collapse
|
18
|
Clay O, Bernardi G. Compositional heterogeneity within and among isochores in mammalian genomes. II. Some general comments. Gene 2001; 276:25-31. [PMID: 11591468 DOI: 10.1016/s0378-1119(01)00668-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The presence of long-range correlations and/or mosaicism in DNA sequences results in GC fluctuations, even within individual isochores that are much larger than expected correlation-free 'random' sequences. Neglecting the presence of such fluctuations can lead to incorrect conclusions regarding relative homogeneity or isochore borders. In this commentary, we address these and other methodological issues raised by the variations in GC level within human isochores. We also discuss some recent misconceptions.
Collapse
Affiliation(s)
- O Clay
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | |
Collapse
|
19
|
Abstract
A few months ago the International Human Genome Sequencing Consortium (IHGSC) published a 61-page paper on the human genome (IHGSC, Nature 409 (2001) 860). Here comments will be presented on some points of the paper that were previously investigated in our laboratory, and some misunderstandings and misconceptions about the organization and the evolutionary history of the human genome will be discussed. A very recent article on the same subject (Eyre-Walker and Hurst, Nat. Rev. Genet. 2 (2001) 549) will also be addressed. The present paper is a complement to two review articles which were published last year (Bernardi, Gene 241 (2000) 3; Gene 259(1) (2000) 31).
Collapse
Affiliation(s)
- G Bernardi
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|