1
|
Oteo JA, Oteo-García G. Mutations along human chromosomes: How randomly scattered are they? Phys Rev E 2022; 106:064404. [PMID: 36671182 DOI: 10.1103/physreve.106.064404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 11/06/2022] [Indexed: 12/12/2022]
Abstract
The diversity of mutations in human chromosomes is nowadays very well documented. The mutations characterize populations in the world as well as genetic causes of diseases. In the approach that we follow, we study the patterns of gaps between mutations by means of the rescaled range analysis and the fractal dimension estimates. The results for chromosomes 1 to 22 and X indicate the existence of the so-called Hurst phenomenon in all of them. The interpretation of this outcome entails the presence of long-range correlations and we propose an explanation based on the genomic feature dubbed linkage disequilibrium, a nonrandom association of alleles at different loci. An unexpected outcome is the noteworthy uniform reduction in the Hurst phenomenon when considering the centimorgan metric instead of base position units. By contrast, such uniform reduction is not observed with the fractal dimension values.
Collapse
Affiliation(s)
- José-Angel Oteo
- Departament de Física Teòrica, Universitat de València, 46100 Burjassot, Valencia, Spain and Institute for Integrative Systems Biology, 46980 Paterna, Valencia, Spain
| | - Gonzalo Oteo-García
- Department of Chemistry, Life Sciences and Environmental Sustainability, Università di Parma, 43121 Parma, Italy
| |
Collapse
|
2
|
Li W, Almirantis Y, Provata A. Revisiting the neutral dynamics derived limiting guanine-cytosine content using human de novo point mutation data. Meta Gene 2022. [DOI: 10.1016/j.mgene.2021.100994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
3
|
Salgado-García R. Time-irreversibility test for random-length time series: The matching-time approach applied to DNA. CHAOS (WOODBURY, N.Y.) 2021; 31:123126. [PMID: 34972331 DOI: 10.1063/5.0062805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 12/01/2021] [Indexed: 06/14/2023]
Abstract
In this work, we implement the so-called matching-time estimators for estimating the entropy rate as well as the entropy production rate for symbolic sequences. These estimators are based on recurrence properties of the system, which have been shown to be appropriate for testing irreversibility, especially when the sequences have large correlations or memory. Based on limit theorems for matching times, we derive a maximum likelihood estimator for the entropy rate by assuming that we have a set of moderately short symbolic time series of finite random duration. We show that the proposed estimator has several properties that make it adequate for estimating the entropy rate and entropy production rate (or for testing the irreversibility) when the sample sequences have different lengths, such as the coding sequences of DNA. We test our approach with controlled examples of Markov chains, non-linear chaotic maps, and linear and non-linear autoregressive processes. We also implement our estimators for genomic sequences to show that the degree of irreversibility of coding sequences in human DNA is significantly larger than that for the corresponding non-coding sequences.
Collapse
Affiliation(s)
- R Salgado-García
- Centro de Investigación en Ciencias-IICBA, Physics Department, Universidad Autónoma del Estado de Morelos, Avenida Universidad 1001, colonia Chamilpa, CP 62209, Cuernavaca Morelos, Mexico
| |
Collapse
|
4
|
Spatial constrains and information content of sub-genomic regions of the human genome. iScience 2021; 24:102048. [PMID: 33554061 PMCID: PMC7843455 DOI: 10.1016/j.isci.2021.102048] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/30/2020] [Accepted: 01/06/2021] [Indexed: 02/08/2023] Open
Abstract
Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques, and a technical index, integrating the generated information, which we introduce and named complexity factor (COFA). Our analysis revealed that the size distribution of the genomic regions within chromosomes are not random but follow patterns with characteristic features that have been seen through its complexity character, and it is part of the dynamics of the whole genome. Finally, this picture of dynamics in DNA is recognized using ML tools for clustering, classification, and prediction with high accuracy. The lengths of DNA subgenomic entities satisfied the Tsallis non-extensive statistics The size distribution of the subgenomic entities within chromosomes follow specific patterns A technical index COFA was introduced to characterize the degree of complexity The degree of complexity behavior in DNA is identifiable using ML approaches
Collapse
|
5
|
Salgado-García R, Maldonado C. Estimating entropy rate from censored symbolic time series: A test for time-irreversibility. CHAOS (WOODBURY, N.Y.) 2021; 31:013131. [PMID: 33754777 DOI: 10.1063/5.0032515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 12/28/2020] [Indexed: 06/12/2023]
Abstract
In this work, we introduce a method for estimating the entropy rate and the entropy production rate from a finite symbolic time series. From the point of view of statistics, estimating entropy from a finite series can be interpreted as a problem of estimating parameters of a distribution with a censored or truncated sample. We use this point of view to give estimations of the entropy rate and the entropy production rate, assuming that they are parameters of a (limit) distribution. The last statement is actually a consequence of the fact that the distribution of estimations obtained from recurrence-time statistics satisfies the central limit theorem. We test our method using a time series coming from Markov chain models, discrete-time chaotic maps, and a real DNA sequence from the human genome.
Collapse
Affiliation(s)
- R Salgado-García
- Centro de Investigación en Ciencias-IICBA, Physics Department, Universidad Autónoma del Estado de Morelos, Avenida Universidad 1001 colonia Chamilpa, CP 62209, Cuernavaca Morelos, Mexico
| | - Cesar Maldonado
- IPICYT/División de Control y Sistemas Dinámicos, Camino a la Presa San José 2055, Lomas 4a. sección, C.P. 78216, San Luis Potosí, S.L.P., Mexico
| |
Collapse
|
6
|
Salgado-García R. Noise-induced rectification in out-of-equilibrium structures. Phys Rev E 2019; 99:012128. [PMID: 30780318 DOI: 10.1103/physreve.99.012128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Indexed: 06/09/2023]
Abstract
We consider the motion of overdamped particles over random potentials subjected to a Gaussian white noise and a time-dependent periodic external forcing. The random potential is modeled as the potential resulting from the interaction of a point particle with a random polymer. The random polymer is made up, by means of some stochastic process, from a finite set of possible monomer types. The process is assumed to reach a nonequilibrium stationary state, which means that every realization of a random polymer can be considered as an out-of-equilibrium structure. We show that the net flux of particles over this random medium is nonvanishing when the potential profile on every monomer is symmetric. We prove that this ratchetlike phenomenon is a consequence of the irreversibility of the stochastic process generating the polymer. On the contrary, when the process generating the polymer is at equilibrium (thus fulfilling the detailed balance condition) the system is unable to rectify the motion. We calculate the net flux of the particles in the adiabatic limit for a simple model and we test our theoretical predictions by means of Langevin dynamics simulations. We also show that, out of the adiabatic limit, the system also exhibits current reversals as well as nonmonotonic dependence of the diffusion coefficient as a function of forcing amplitude.
Collapse
Affiliation(s)
- R Salgado-García
- Centro de Investigación en Ciencias-IICBA, Universidad Autónoma del Estado de Morelos, Avenida Universidad 1001, Colonia Chamilpa, 62209, Cuernavaca Morelos, Mexico
| |
Collapse
|
7
|
Abstract
The kinetic equations of DNA replication are shown to be exactly solved in terms of iterated function systems, running along the template sequence and giving the statistical properties of the copy sequences, as well as the kinetic and thermodynamic properties of the replication process. With this method, different effects due to sequence heterogeneity can be studied, in particular, a transition between linear and sublinear growths in time of the copies, and a transition between continuous and fractal distributions of the local velocities of the DNA polymerase along the template. The method is applied to the human mitochondrial DNA polymerase γ without and with exonuclease proofreading.
Collapse
Affiliation(s)
- Pierre Gaspard
- Center for Nonlinear Phenomena and Complex Systems, Université libre de Bruxelles (ULB), Code Postal 231, Campus Plaine, B-1050 Brussels, Belgium
| |
Collapse
|
8
|
Song YS, Shu YG, Zhou X, Ou-Yang ZC, Li M. Proofreading of DNA polymerase: a new kinetic model with higher-order terminal effects. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2017; 29:025101. [PMID: 27842005 DOI: 10.1088/0953-8984/29/2/025101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The fidelity of DNA replication by DNA polymerase (DNAP) has long been an important issue in biology. While numerous experiments have revealed details of the molecular structure and working mechanism of DNAP which consists of both a polymerase site and an exonuclease (proofreading) site, there were quite a few theoretical studies on the fidelity issue. The first model which explicitly considered both sites was proposed in the 1970s and the basic idea was widely accepted by later models. However, all these models did not systematically investigate the dominant factor on DNAP fidelity, i.e. the higher-order terminal effects through which the polymerization pathway and the proofreading pathway coordinate to achieve high fidelity. In this paper, we propose a new and comprehensive kinetic model of DNAP based on some recent experimental observations, which includes previous models as special cases. We present a rigorous and unified treatment of the corresponding steady-state kinetic equations of any-order terminal effects, and derive analytical expressions for fidelity in terms of kinetic parameters under bio-relevant conditions. These expressions offer new insights on how the higher-order terminal effects contribute substantially to the fidelity in an order-by-order way, and also show that the polymerization-and-proofreading mechanism is dominated only by very few key parameters. We then apply these results to calculate the fidelity of some real DNAPs, which are in good agreements with previous intuitive estimates given by experimentalists.
Collapse
Affiliation(s)
- Yong-Shun Song
- School of Physical Sciences, University of Chinese Academy of Sciences, No 19A Yuquan Road, Beijing 100049, People's Republic of China
| | | | | | | | | |
Collapse
|
9
|
Gaspard P. Kinetics and thermodynamics of living copolymerization processes. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2016.0147. [PMID: 27698043 PMCID: PMC5052731 DOI: 10.1098/rsta.2016.0147] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/01/2016] [Indexed: 05/06/2023]
Abstract
Theoretical advances are reported on the kinetics and thermodynamics of free and template-directed living copolymerizations. Until recently, the kinetic theory of these processes had only been established in the fully irreversible regime, in which the attachment rates are only considered. However, the entropy production is infinite in this regime and the approach to thermodynamic equilibrium cannot be investigated. For this purpose, the detachment rates should also be included. Inspite of this complication, the kinetics can be exactly solved in the regimes of steady growth and depolymerization. In this way, analytical expressions are obtained for the mean growth velocity, the statistical properties of the copolymer sequences, as well as the thermodynamic entropy production. The results apply to DNA replication, transcription and translation, allowing us to understand important aspects of molecular evolution.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.
Collapse
Affiliation(s)
- Pierre Gaspard
- Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Code Postal 231, Campus Plaine, 1050 Brussels, Belgium
| |
Collapse
|
10
|
Nicolis G, Nicolis C. Detailed balance, nonequilibrium states, and dissipation in symbolic sequences. Phys Rev E 2016; 93:052134. [PMID: 27300856 DOI: 10.1103/physreve.93.052134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Indexed: 06/06/2023]
Abstract
Symbolic sequences arising from the coarse graining of deterministic dynamical systems continuous in phase space are considered. The extent to which signatures of the time irreversibility and of the nonequilibrium constraints at the level of the original system, such as fluxes or dissipation, can be identified at the coarse-grained level is analyzed. The roles of the partition, of the time window, and of time averaging in distinguishing in a clear-cut way the equilibrium versus nonequilibrium character of the sequence are brought out.
Collapse
Affiliation(s)
- G Nicolis
- Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, CP 231, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - C Nicolis
- Institut Royal Météorologique de Belgique, 3 Avenue Circulaire, 1180 Brussels, Belgium
| |
Collapse
|
11
|
Gaspard P. Kinetics and thermodynamics of exonuclease-deficient DNA polymerases. Phys Rev E 2016; 93:042419. [PMID: 27176340 DOI: 10.1103/physreve.93.042419] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Indexed: 05/02/2023]
Abstract
A kinetic theory is developed for exonuclease-deficient DNA polymerases, based on the experimental observation that the rates depend not only on the newly incorporated nucleotide, but also on the previous one, leading to the growth of Markovian DNA sequences from a Bernoullian template. The dependencies on nucleotide concentrations and template sequence are explicitly taken into account. In this framework, the kinetic and thermodynamic properties of DNA replication, in particular, the mean growth velocity, the error probability, and the entropy production are calculated analytically in terms of the rate constants and the concentrations. Theory is compared with numerical simulations for the DNA polymerases of T7 viruses and human mitochondria.
Collapse
Affiliation(s)
- Pierre Gaspard
- Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Code Postal 231, Campus Plaine, B-1050 Brussels, Belgium
| |
Collapse
|
12
|
|
13
|
Provata A, Nicolis C, Nicolis G. Complexity measures for the evolutionary categorization of organisms. Comput Biol Chem 2014; 53 Pt A:5-14. [PMID: 25216557 DOI: 10.1016/j.compbiolchem.2014.08.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/17/2023]
Abstract
Complexity measures are used to compare the genomic characteristics of five organisms belonging to distinct classes spanning the evolutionary tree: higher eukaryotes, amoebae, unicellular eukaryotes and bacteria. The comparisons are undertaken using the full four-letter alphabet and the coarse grained two-letter alphabets AG-CT and AT-CG. We show that the conditional probability matrix for the four-letter and AT-CG alphabet is markedly asymmetric in eukaryotes while it is nearly symmetric in bacterial genomes. Spatial asymmetry is revealed in the four-letter alphabet, signifying that the probability fluxes are nonvanishing and thus the reading sense of a sequence is irreversible for all organisms. Calculations of the block entropy and excess entropy demonstrate that the human genome accommodates better all possible block configurations, especially for long blocks. With respect to point-to-point details and to spatial arrangement of blocks the exit distance distributions from a particular letter demonstrate long distance characteristics in the eukaryotic sequences for all three alphabets, while the bacterial (prokaryotic) genomes deviate indicating short range characteristics. Overall, the conditional probability, the fluxes, the block entropy content and the exit distance distributions can be used as markers, discriminating between eukaryotic and prokaryotic DNA, allowing in many cases to discern details related to finer classes. In all cases the reduction from four letters to two masks some important statistical and spatial properties, with the AT-CG alphabet having higher ability of discrimination than the AG-CT one. In particular, the AT-CG alphabet reduction accentuates the CpG related properties (conditional probabilities w32, long ranged exit distance distribution for A and T nucleotides), but masks sequence asymmetry and irreversibility in all examined organisms.
Collapse
Affiliation(s)
- A Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", 15310 Athens, Greece.
| | - C Nicolis
- Institut Royal Météorogique de Belgique, 3 Avenue Circulaire, 1180 Bruxelles, Belgium.
| | - G Nicolis
- Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, C.P. 231, 1050 Bruxelles, Belgium.
| |
Collapse
|