1
|
Tkačik G, Wolde PRT. Information Processing in Biochemical Networks. Annu Rev Biophys 2025; 54:249-274. [PMID: 39929539 DOI: 10.1146/annurev-biophys-060524-102720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2025]
Abstract
Living systems are characterized by controlled flows of matter, energy, and information. While the biophysics community has productively engaged with the first two, addressing information flows has been more challenging, with some scattered success in evolutionary theory and a more coherent track record in neuroscience. Nevertheless, interdisciplinary work of the past two decades at the interface of biophysics, quantitative biology, and engineering has led to an emerging mathematical language for describing information flows at the molecular scale. This is where the central processes of life unfold: from detection and transduction of environmental signals to the readout or copying of genetic information and the triggering of adaptive cellular responses. Such processes are coordinated by complex biochemical reaction networks that operate at room temperature, are out of equilibrium, and use low copy numbers of diverse molecular species with limited interaction specificity. Here we review how flows of information through biochemical networks can be formalized using information-theoretic quantities, quantified from data, and computed within various modeling frameworks. Optimization of information flows is presented as a candidate design principle that navigates the relevant time, energy, crosstalk, and metabolic constraints to predict reliable cellular signaling and gene regulation architectures built of individually noisy components.
Collapse
Affiliation(s)
- Gašper Tkačik
- Institute of Science and Technology Austria, Klosterneuburg, Austria;
| | | |
Collapse
|
2
|
Matheson J, Exposito-Alonso M, Masel J. Substitution load revisited: a high proportion of deaths can be selective. Genetics 2025; 229:iyaf011. [PMID: 39862233 PMCID: PMC12005247 DOI: 10.1093/genetics/iyaf011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 12/22/2024] [Indexed: 01/27/2025] Open
Abstract
Haldane's Dilemma refers to the concern that the need for many "selective deaths" to complete a substitution (i.e. selective sweep) creates a speed limit to adaptation. However, discussion of this concern has been marked by confusion, especially with respect to the term "substitution load". Here, we distinguish different historical lines of reasoning, and identify one, focused on finite reproductive excess and the proportion of deaths that are "selective" (i.e. causally contribute to adaptive allele frequency changes), that has not yet been fully addressed. We develop this into a more general theoretical model that can apply to populations with any life history, even those for which a generation or even an individual are not well defined. The actual speed of adaptive evolution is coupled to the proportion of deaths that are selective. The degree to which reproductive excess enables a high proportion of selective deaths depends on the details of when selection takes place relative to density regulation, and there is therefore no general expression for a speed limit. To make these concepts concrete, we estimate both reproductive excess, and the proportion of deaths that are selective, from a dataset measuring survival of 517 different genotypes of Arabidopsis thaliana grown in 8 different environmental conditions. In this dataset, a much higher proportion of deaths contribute to adaptation, in all environmental conditions, than the 10% cap that was anticipated as substantially restricting adaptation during historical discussions of speed limits.
Collapse
Affiliation(s)
- Joseph Matheson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Ecology, Behavior, and Evolution, University of California San Diego, San Diego, CA 92093, USA
| | - Moises Exposito-Alonso
- Departments of Plant Biology & Global Ecology, Carnegie Institution for Science, Stanford University, Stanford, CA 94305, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
3
|
Grah R, Guet CC, Tkačik G, Lagator M. Linking molecular mechanisms to their evolutionary consequences: a primer. Genetics 2025; 229:iyae191. [PMID: 39601269 PMCID: PMC11796464 DOI: 10.1093/genetics/iyae191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 11/13/2024] [Indexed: 11/29/2024] Open
Abstract
A major obstacle to predictive understanding of evolution stems from the complexity of biological systems, which prevents detailed characterization of key evolutionary properties. Here, we highlight some of the major sources of complexity that arise when relating molecular mechanisms to their evolutionary consequences and ask whether accounting for every mechanistic detail is important to accurately predict evolutionary outcomes. To do this, we developed a mechanistic model of a bacterial promoter regulated by 2 proteins, allowing us to connect any promoter genotype to 6 phenotypes that capture the dynamics of gene expression following an environmental switch. Accounting for the mechanisms that govern how this system works enabled us to provide an in-depth picture of how regulated bacterial promoters might evolve. More importantly, we used the model to explore which factors that contribute to the complexity of this system are essential for understanding its evolution, and which can be simplified without information loss. We found that several key evolutionary properties-the distribution of phenotypic and fitness effects of mutations, the evolutionary trajectories during selection for regulation-can be accurately captured without accounting for all, or even most, parameters of the system. Our findings point to the need for a mechanistic approach to studying evolution, as it enables tackling biological complexity and in doing so improves the ability to predict evolutionary outcomes.
Collapse
Affiliation(s)
- Rok Grah
- Institute of Science and Technology Austria, Klosterneuburg AT-3400, Austria
| | - Calin C Guet
- Institute of Science and Technology Austria, Klosterneuburg AT-3400, Austria
| | - Gasper Tkačik
- Institute of Science and Technology Austria, Klosterneuburg AT-3400, Austria
| | - Mato Lagator
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
4
|
Perkins ML, Crocker J, Tkačik G. Chromatin enables precise and scalable gene regulation with factors of limited specificity. Proc Natl Acad Sci U S A 2025; 122:e2411887121. [PMID: 39793086 PMCID: PMC11725945 DOI: 10.1073/pnas.2411887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 11/22/2024] [Indexed: 01/12/2025] Open
Abstract
Biophysical constraints limit the specificity with which transcription factors (TFs) can target regulatory DNA. While individual nontarget binding events may be low affinity, the sheer number of such interactions could present a challenge for gene regulation by degrading its precision or possibly leading to an erroneous induction state. Chromatin can prevent nontarget binding by rendering DNA physically inaccessible to TFs, at the cost of energy-consuming remodeling orchestrated by pioneer factors (PFs). Under what conditions and by how much can chromatin reduce regulatory errors on a global scale? We use a theoretical approach to compare two scenarios for gene regulation: one that relies on TF binding to free DNA alone and one that uses a combination of TFs and chromatin-regulating PFs to achieve desired gene expression patterns. We find, first, that chromatin effectively silences groups of genes that should be simultaneously OFF, thereby allowing more accurate graded control of expression for the remaining ON genes. Second, chromatin buffers the deleterious consequences of nontarget binding as the number of OFF genes grows, permitting a substantial expansion in regulatory complexity. Third, chromatin-based regulation productively co-opts nontarget TF binding for ON genes in order to establish a "leaky" baseline expression level, which targeted activator or repressor binding subsequently up- or down-modulates. Thus, on a global scale, using chromatin simultaneously alleviates pressure for high specificity of regulatory interactions and enables an increase in genome size with minimal impact on global expression error.
Collapse
Affiliation(s)
- Mindy Liu Perkins
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117Heidelberg, Germany
| | - Justin Crocker
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117Heidelberg, Germany
| | - Gašper Tkačik
- Institute of Science and Technology Austria, AT-3400Klosterneuburg, Austria
| |
Collapse
|
5
|
Sokolowski TR, Gregor T, Bialek W, Tkačik G. Deriving a genetic regulatory network from an optimization principle. Proc Natl Acad Sci U S A 2025; 122:e2402925121. [PMID: 39752518 PMCID: PMC11725783 DOI: 10.1073/pnas.2402925121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 11/13/2024] [Indexed: 01/11/2025] Open
Abstract
Many biological systems operate near the physical limits to their performance, suggesting that aspects of their behavior and underlying mechanisms could be derived from optimization principles. However, such principles have often been applied only in simplified models. Here, we explore a detailed mechanistic model of the gap gene network in the Drosophila embryo, optimizing its 50+ parameters to maximize the information that gene expression levels provide about nuclear positions. This optimization is conducted under realistic constraints, such as limits on the number of available molecules. Remarkably, the optimal networks we derive closely match the architecture and spatial gene expression profiles observed in the real organism. Our framework quantifies the tradeoffs involved in maximizing functional performance and allows for the exploration of alternative network configurations, addressing the question of which features are necessary and which are contingent. Our results suggest that multiple solutions to the optimization problem might exist across closely related organisms, offering insights into the evolution of gene regulatory networks.
Collapse
Affiliation(s)
- Thomas R. Sokolowski
- Institute of Science and Technology Austria, KlosterneuburgAT-3400, Austria
- Frankfurt Institute for Advanced Studies, Frankfurt am MainDE-60438, Germany
| | - Thomas Gregor
- Joseph Henry Laboratory of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
- Department of Stem Cell and Developmental Biology, UMR3738, Institut Pasteur, ParisFR-75015, France
| | - William Bialek
- Joseph Henry Laboratory of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
- Center for Studies in Physics and Biology, Rockefeller University, New York, NY10065
| | - Gašper Tkačik
- Institute of Science and Technology Austria, KlosterneuburgAT-3400, Austria
| |
Collapse
|
6
|
Carvajal-Rodríguez A. On Non-Random Mating, Adaptive Evolution, and Information Theory. BIOLOGY 2024; 13:970. [PMID: 39765637 PMCID: PMC11673741 DOI: 10.3390/biology13120970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 11/18/2024] [Accepted: 11/22/2024] [Indexed: 01/11/2025]
Abstract
Population genetics describes evolutionary processes, focusing on the variation within and between species and the forces shaping this diversity. Evolution reflects information accumulated in genomes, enhancing organisms' adaptation to their environment. In this paper, I propose a model that begins with the distribution of mating based on mutual fitness and progresses to viable adult genotype distribution. At each stage, the changes result in different measures of information. The evolutionary dynamics at each stage of the model correspond to certain aspects of interest, such as the type of mating, the distribution of genotypes in regard to mating, and the distribution of genotypes and haplotypes in the next generation. Changes to these distributions are caused by variations in fitness and result in Jeffrey's divergence values other than zero. As an example, a model of hybrid sterility is developed of a biallelic locus, comparing the information indices associated with each stage of the evolutionary process. In conclusion, the informational perspective seems to facilitate the connection between cause and effect and allows the development of statistical tests to perform hypothesis testing against zero-information null models (random mating, no selection, etc.). The informational perspective could contribute to clarify, deepen, and expand the mathematical foundations of evolutionary theory.
Collapse
Affiliation(s)
- Antonio Carvajal-Rodríguez
- Centro de Investigación Mariña (CIM), Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
7
|
Carvajal-Rodríguez A. Unifying quantification methods for sexual selection and assortative mating using information theory. Theor Popul Biol 2024; 158:206-215. [PMID: 38917935 DOI: 10.1016/j.tpb.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 06/19/2024] [Accepted: 06/20/2024] [Indexed: 06/27/2024]
Abstract
Sexual selection plays a crucial role in modern evolutionary theory, offering valuable insight into evolutionary patterns and species diversity. Recently, a comprehensive definition of sexual selection has been proposed, defining it as any selection that arises from fitness differences associated with nonrandom success in the competition for access to gametes for fertilization. Previous research on discrete traits demonstrated that non-random mating can be effectively quantified using Jeffreys (or symmetrized Kullback-Leibler) divergence, capturing information acquired through mating influenced by mutual mating propensities instead of random occurrences. This novel theoretical framework allows for detecting and assessing the strength of sexual selection and assortative mating. In this study, we aim to achieve two primary objectives. Firstly, we demonstrate the seamless alignment of the previous theoretical development, rooted in information theory and mutual mating propensity, with the aforementioned definition of sexual selection. Secondly, we extend the theory to encompass quantitative traits. Our findings reveal that sexual selection and assortative mating can be quantified effectively for quantitative traits by measuring the information gain relative to the random mating pattern. The connection of the information indices of sexual selection with the classical measures of sexual selection is established. Additionally, if mating traits are normally distributed, the measure capturing the underlying information of assortative mating is a function of the square of the correlation coefficient, taking values within the non-negative real number set [0, +∞). It is worth noting that the same divergence measure captures information acquired through mating for both discrete and quantitative traits. This is interesting as it provides a common context and can help simplify the study of sexual selection patterns.
Collapse
Affiliation(s)
- A Carvajal-Rodríguez
- Centro de Investigación Mariña (CIM), Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain.
| |
Collapse
|
8
|
Smith E. Beyond fitness: The information imparted in population states by selection throughout lifecycles. Theor Popul Biol 2024; 157:86-117. [PMID: 38615922 DOI: 10.1016/j.tpb.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/25/2024] [Accepted: 04/05/2024] [Indexed: 04/16/2024]
Abstract
We approach the questions, what part of evolutionary change results from selection, and what is the adaptive information flow into a population undergoing selection, as a problem of quantifying the divergence of typical trajectories realized under selection from the expected dynamics of their counterparts under a null stochastic-process model representing the absence of selection. This approach starts with a formulation of adaptation in terms of information and from that identifies selection from the genetic parameters that generate information flow; it is the reverse of a historical approach that defines selection in terms of fitness, and then identifies adaptive characters as those amplified in relative frequency by fitness. Adaptive information is a relative entropy on distributions of histories computed directly from the generators of stochastic evolutionary population processes, which in large population limits can be approximated by its leading exponential dependence as a large-deviation function. We study a particular class of generators that represent the genetic dependence of explicit transitions around reproductive cycles in terms of stoichiometry, familiar from chemical reaction networks. Following Smith (2023), which showed that partitioning evolutionary events among genetically distinct realizations of lifecycles yields a more consistent causal analysis through the Price equation than the construction from units of selection and fitness, here we show that it likewise yields more complete evolutionary information measures.
Collapse
Affiliation(s)
- Eric Smith
- Earth-Life Science Institute, Tokyo Institute of Technology, 2-12-1-IE-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan; School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Drive NW, Atlanta, GA, 30332, USA; Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA.
| |
Collapse
|
9
|
Vellnow N, Gossmann TI, Waxman D. The pseudoentropy of allele frequency trajectories, the persistence of variation, and the effective population size. Biosystems 2024; 238:105176. [PMID: 38479654 DOI: 10.1016/j.biosystems.2024.105176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 03/24/2024]
Abstract
To concisely describe how genetic variation, at individual loci or across whole genomes, changes over time, and to follow transitory allelic changes, we introduce a quantity related to entropy, that we term pseudoentropy. This quantity emerges in a diffusion analysis of the mean time a mutation segregates in a population. For a neutral locus with an arbitrary number of alleles, the mean time of segregation is generally proportional to the pseudoentropy of initial allele frequencies. After the initial time point, pseudoentropy generally decreases, but other behaviours are possible, depending on the genetic diversity and selective forces present. For a biallelic locus, pseudoentropy and entropy coincide, but they are distinct quantities with more than two alleles. Thus for populations with multiple biallelic loci, the language of entropy suffices. Then entropy, combined across loci, serves as a concise description of genetic variation. We used individual based simulations to explore how this entropy behaves under different evolutionary scenarios. In agreement with predictions, the entropy associated with unlinked neutral loci decreases over time. However, deviations from free recombination and neutrality have clear and informative effects on the entropy's behaviour over time. Analysis of publicly available data of a natural D. melanogaster population, that had been sampled over seven years, using a sliding-window approach, yielded considerable variation in entropy trajectories of different genomic regions. These mostly follow a pattern that suggests a substantial effective population size and a limited effect of positive selection on genome-wide diversity over short time scales.
Collapse
Affiliation(s)
- Nikolas Vellnow
- TU Dortmund University, Computational Systems Biology, Faculty of Biochemical and Chemical Engineering, Emil-Figge-Str. 66, 44227 Dortmund, Germany.
| | - Toni I Gossmann
- TU Dortmund University, Computational Systems Biology, Faculty of Biochemical and Chemical Engineering, Emil-Figge-Str. 66, 44227 Dortmund, Germany.
| | - David Waxman
- Fudan University, Centre for Computational Systems Biology, ISTBI, 220 Handan Road, Shanghai 200433, People's Republic of China.
| |
Collapse
|
10
|
Herbert A. The Intransitive Logic of Directed Cycles and Flipons Enhances the Evolution of Molecular Computers by Augmenting the Kolmogorov Complexity of Genomes. Int J Mol Sci 2023; 24:16482. [PMID: 38003672 PMCID: PMC10671625 DOI: 10.3390/ijms242216482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/14/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
Cell responses are usually viewed as transitive events with fixed inputs and outputs that are regulated by feedback loops. In contrast, directed cycles (DCs) have all nodes connected, and the flow is in a single direction. Consequently, DCs can regenerate themselves and implement intransitive logic. DCs are able to couple unrelated chemical reactions to each edge. The output depends upon which node is used as input. DCs can also undergo selection to minimize the loss of thermodynamic entropy while maximizing the gain of information entropy. The intransitive logic underlying DCs enhances their programmability and impacts their evolution. The natural selection of DCs favors the persistence, adaptability, and self-awareness of living organisms and does not depend solely on changes to coding sequences. Rather, the process can be RNA-directed. I use flipons, nucleic acid sequences that change conformation under physiological conditions, as a simple example and then describe more complex DCs. Flipons are often encoded by repeats and greatly increase the Kolmogorov complexity of genomes by adopting alternative structures. Other DCs allow cells to regenerate, recalibrate, reset, repair, and rewrite themselves, going far beyond the capabilities of current computational devices. Unlike Turing machines, cells are not designed to halt but rather to regenerate.
Collapse
Affiliation(s)
- Alan Herbert
- InsideOutBio, 42 8th Street, Charlestown, MA 02129, USA
| |
Collapse
|
11
|
Poulton JM, Altenberg L, Watkins C. Evolution with recombination as Gibbs sampling. Theor Popul Biol 2023; 151:28-43. [PMID: 37030660 DOI: 10.1016/j.tpb.2023.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/28/2023] [Accepted: 03/30/2023] [Indexed: 04/10/2023]
Abstract
This work presents a population genetic model of evolution, which includes haploid selection, mutation, recombination, and drift. The mutation-selection equilibrium can be expressed exactly in closed form for arbitrary fitness functions without resorting to diffusion approximations. Tractability is achieved by generating new offspring using n-parent rather than 2-parent recombination. While this enforces linkage equilibrium among offspring, it allows analysis of the whole population under linkage disequilibrium. We derive a general and exact relationship between fitness fluctuations and response to selection. Our assumptions allow analytical calculation of the stationary distribution of the model for a variety of non-trivial fitness functions. These results allow us to speak to genetic architecture, i.e., what stationary distributions result from different fitness functions. This paper presents methods for exactly deriving stationary states for finite and infinite populations. This method can be applied to many fitness functions, and we give exact calculations for four of these. These results allow us to investigate metastability, tradeoffs between fitness functions, and even consider error-correcting codes.
Collapse
Affiliation(s)
- Jenny M Poulton
- Foundation for Fundamental Research on Matter (FOM) Institute for Atomic and Molecular Physics (AMOLF), Amsterdam, 1098 XE, The Netherlands
| | - Lee Altenberg
- Department of Mathematics, University of Hawai'i at Mānoa, 2565 McCarthy Mall (Keller Hall 401A), Honolulu, HI 96822, United States
| | - Chris Watkins
- Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, United Kingdom.
| |
Collapse
|
12
|
Soriano J, Marzen S. How Well Can We Infer Selection Benefits and Mutation Rates from Allele Frequencies? ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25040615. [PMID: 37190403 PMCID: PMC10137336 DOI: 10.3390/e25040615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/24/2023] [Accepted: 03/26/2023] [Indexed: 05/17/2023]
Abstract
Experimentalists observe allele frequency distributions and try to infer mutation rates and selection coefficients. How easy is this? We calculate limits to their ability in the context of the Wright-Fisher model by first finding the maximal amount of information that can be acquired using allele frequencies about the mutation rate and selection coefficient- at least 2 bits per allele- and then by finding how the organisms would have shaped their mutation rates and selection coefficients so as to maximize the information transfer.
Collapse
Affiliation(s)
- Jonathan Soriano
- W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna College, Claremont, CA 91711, USA
| | - Sarah Marzen
- W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna College, Claremont, CA 91711, USA
| |
Collapse
|