1
|
Goldenberg M, Mualem L, Shahar A, Snir S, Akavia A. Privacy-preserving biological age prediction over federated human methylation data using fully homomorphic encryption. Genome Res 2024; 34:1324-1333. [PMID: 39237299 PMCID: PMC11529865 DOI: 10.1101/gr.279071.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/07/2024] [Indexed: 09/07/2024]
Abstract
DNA methylation data play a crucial role in estimating chronological age in mammals, offering real-time insights into an individual's aging process. The epigenetic pacemaker (EPM) model allows inference of the biological age as deviations from the population trend. Given the sensitivity of this data, it is essential to safeguard both inputs and outputs of the EPM model. A privacy-preserving approach for EPM computation utilizing fully homomorphic encryption was recently introduced. However, this method has limitations, including having high communication complexity and being impractical for large data sets. The current work presents a new privacy-preserving protocol for EPM computation, analytically improving both privacy and complexity. Notably, we employ a single server for the secure computation phase while ensuring privacy even in the event of server corruption (compared to requiring two noncolluding servers in prior work). Using techniques from symbolic algebra and number theory, the new protocol eliminates the need for communication during secure computation, significantly improves asymptotic runtime, and offers better compatibility to parallel computing for further time complexity reduction. We implemented our protocol, demonstrating its ability to produce results similar to the standard (insecure) EPM model with substantial performance improvement compared to prior work. These findings hold promise for enhancing data security in medical applications where personal privacy is paramount. The generality of both the new approach and the EPM suggests that this protocol may be useful in other applications employing similar expectation-maximization techniques.
Collapse
Affiliation(s)
- Meir Goldenberg
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| | - Loay Mualem
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| | - Amit Shahar
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, The University of Haifa, Haifa 3103301, Israel
| | - Adi Akavia
- Department of Computer Science, The University of Haifa, Haifa 3103301, Israel;
| |
Collapse
|
2
|
Epigenetic pacemaker: closed form algebraic solutions. BMC Genomics 2020; 21:257. [PMID: 32299339 PMCID: PMC7161103 DOI: 10.1186/s12864-020-6606-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface. Results Here we introduce non trivial improvements to these steps that are essential for analyzing data sets of realistic magnitude in a manageable time and space. These structural improvements are based on insights from linear algebra and symbolic algebra tools, providing us greater understanding of the degeneracy of the complex problem space. This understanding in turn, leads to the complete elimination of the bottleneck of cumbersome matrix multiplication and inversion, yielding a fast closed form solution in both steps of the CEM.In the experimental results part, we compare the CEM algorithm over several data sets and demonstrate the speedup obtained by the closed form solutions. Our results support the theoretical analysis of this improvement. Conclusions These improvements enable us to increase substantially the scale of inputs analyzed by the method, allowing us to apply the new approach to data sets that could not be analyzed before.
Collapse
|
3
|
Sevillya G, Adato O, Snir S. Detecting horizontal gene transfer: a probabilistic approach. BMC Genomics 2020; 21:106. [PMID: 32138652 PMCID: PMC7057450 DOI: 10.1186/s12864-019-6395-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 12/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) is the event of a DNA sequence being transferred between species not by inheritance. HGT is a crucial factor in prokaryotic evolution and is a significant source for genomic novelty resulting in antibiotic resistance or the outbreak of virulent strains. Detection of HGT and the mechanisms responsible and enabling it, is hence of prime importance.Existing algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from its recipient genome. Closely related species pose an even greater challenge as most genes are very similar and therefore, the phylogenetic signal is weak anyhow. Notwithstanding, the importance of detecting HGT between such organisms is extremely high for the role of HGT in the emergence of new highly virulent strains. RESULTS In a recent work we devised a novel technique that relies on loss of synteny around a gene as a witness for HGT. We used a novel heuristic for synteny measurement, SI (Syntent Index), and the technique was tested on both simulated and real data and was found to provide a greater sensitivity than other HGT techniques. This synteny-based approach suffers low specificity, in particular more closely related species. Here we devise an adaptive approach to cope with this by varying the criteria according to species distance. The new approach is doubly adaptive as it also considers the lengths of the genes being transferred. In particular, we use Chernoff bound to decree HGT both in simulations and real bacterial genomes taken from EggNog database. CONCLUSIONS Here we show empirically that this approach is more conservative than the previous χ2 based approach and provides a lower false positive rate, especially for closely related species and under wide range of genome parameters.
Collapse
Affiliation(s)
- Gur Sevillya
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel
| | - Orit Adato
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel
| | - Sagi Snir
- Dept. of Evolutionary and Environmental Biology, University of Haifa, Haifa, 3498838, Israel.
| |
Collapse
|
4
|
Duchêne DA, Tong KJ, Foster CSP, Duchêne S, Lanfear R, Ho SYW. Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference. Mol Biol Evol 2019; 37:1202-1210. [DOI: 10.1093/molbev/msz291] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
AbstractEvolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.
Collapse
Affiliation(s)
- David A Duchêne
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - K Jun Tong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Charles S P Foster
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Sebastián Duchêne
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC, Australia
| | - Robert Lanfear
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
5
|
Mello B, Schrago CG. The Estimated Pacemaker for Great Apes Supports the Hominoid Slowdown Hypothesis. Evol Bioinform Online 2019; 15:1176934319855988. [PMID: 31223232 PMCID: PMC6566470 DOI: 10.1177/1176934319855988] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/17/2019] [Indexed: 11/16/2022] Open
Abstract
The recent surge of genomic data has prompted the investigation of substitution rate variation across the genome, as well as among lineages. Evolutionary trees inferred from distinct genomic regions may display branch lengths that differ between loci by simple proportionality constants, indicating that rate variation follows a pacemaker model, which may be attributed to lineage effects. Analyses of genes from diverse biological clades produced contrasting results, supporting either this model or alternative scenarios where multiple pacemakers exist. So far, an evaluation of the pacemaker hypothesis for all great apes has never been carried out. In this work, we tested whether the evolutionary rates of hominids conform to pacemakers, which were inferred accounting for gene tree/species tree discordance. For higher precision, substitution rates in branches were estimated with a calibration-free approach, the relative rate framework. A predominant evolutionary trend in great apes was evidenced by the recovery of a large pacemaker, encompassing most hominid genomic regions. In addition, the majority of genes followed a pace of evolution that was closely related to the strict molecular clock. However, slight rate decreases were recovered in the internal branches leading to humans, corroborating the hominoid slowdown hypothesis. Our findings suggest that in great apes, life history traits were the major drivers of substitution rate variation across the genome.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
Snir S, Farrell C, Pellegrini M. Human epigenetic ageing is logarithmic with time across the entire lifespan. Epigenetics 2019; 14:912-926. [PMID: 31138013 DOI: 10.1080/15592294.2019.1623634] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Epigenetic changes during ageing have been characterized by multiple epigenetic clocks that allow the prediction of chronological age based on methylation status. Despite their accuracy and utility, epigenetic age biomarkers leave many questions about epigenetic ageing unanswered. Specifically, they do not permit the unbiased characterization of non-linear epigenetic ageing trends across entire life spans, a critical question underlying this field of research. Here we provide an integrated framework to address this question. Our model, inspired from evolutionary models, is able to account for acceleration/deceleration in epigenetic changes by fitting an individual's model age, the epigenetic age, which is related to chronological age in a non-linear fashion. Application of this model to DNA methylation data measured across broad age ranges, from before birth to old age, and from two tissue types, suggests a universal logarithmic trend characterizes epigenetic ageing across entire lifespans.
Collapse
Affiliation(s)
- Sagi Snir
- a Department of Evolutionary Biology, University of Haifa , Haifa , Israel
| | - Colin Farrell
- b Department of Molecular, Cell and Developmental Biology, University of California , Los Angeles , CA , USA
| | - Matteo Pellegrini
- b Department of Molecular, Cell and Developmental Biology, University of California , Los Angeles , CA , USA
| |
Collapse
|
7
|
Snir S, Pellegrini M. An epigenetic pacemaker is detected via a fast conditional expectation maximization algorithm. Epigenomics 2019; 10:695-706. [PMID: 29979108 DOI: 10.2217/epi-2017-0130] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM DNA methylation has proven to be a remarkably accurate biomarker for human age, allowing the prediction of chronological age to within a couple of years. Recently, we proposed that the Universal PaceMaker (UPM), a flexible paradigm for modeling evolution, could be applied to epigenetic aging. Nevertheless, application to real data was restricted to small datasets for technical limitations. MATERIALS & METHODS We partition the set of variables into to two subsets and optimize the likelihood function on each set separately. This yields an extremely efficient Conditional Expectation Maximization algorithm, alternating between the two sets while increasing the overall likelihood. RESULTS Using the technique, we could reanalyze datasets of larger magnitude and show significant advantage to the UPM approach. CONCLUSION The UPM more faithfully models epigenetic aging than the time linear approach while methylated sites accelerate and decelerate jointly.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, 3498838, Israel
| | - Matteo Pellegrini
- Deptartment of Molecular, Cell & Developmental Biology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
8
|
Abstract
Several studies have pointed out that the tight correlation between genes' evolutionary rate is better explained by a model denoted as the Universal PaceMaker (UPM) rather than by a simple rate constancy as manifested by the classical hypothesis of molecular clock (MC). Under UPM, each gene is associated with a single pacemaker (PM) and varies its evolutionary rate according to this PM ticks. Hence, the relative rates of all genes associated with the same PM remain nearly constant, whereas the absolute rates can change arbitrarily according to the PM ticks. A consequent question to that mentioned is finding the gene-PM association only from the gene sequence data. This, however, turns to be a nontrivial task and is affected by the number of variables, their random noise, and the amount of available information. To this end, a clustering heuristic was devised by exploiting the correlation between corresponding edge lengths across thousands of gene trees. Nevertheless, no theoretical study linking the relationship between the affecting parameters was done. We here study this question by providing theoretical bounds, expressed by the system parameters, on probabilities for positive and negative results. We corroborate these results by a simulation study that reveals the critical role of the variances.
Collapse
Affiliation(s)
- Sagi Snir
- The Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
9
|
Petitjean C, Makarova KS, Wolf YI, Koonin EV. Extreme Deviations from Expected Evolutionary Rates in Archaeal Protein Families. Genome Biol Evol 2018; 9:2791-2811. [PMID: 28985292 PMCID: PMC5737733 DOI: 10.1093/gbe/evx189] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2017] [Indexed: 02/07/2023] Open
Abstract
Origin of new biological functions is a complex phenomenon ranging from single-nucleotide substitutions to the gain of new genes via horizontal gene transfer or duplication. Neofunctionalization and subfunctionalization of proteins is often attributed to the emergence of paralogs that are subject to relaxed purifying selection or positive selection and thus evolve at accelerated rates. Such phenomena potentially could be detected as anomalies in the phylogenies of the respective gene families. We developed a computational pipeline to search for such anomalies in 1,834 orthologous clusters of archaeal genes, focusing on lineage-specific subfamilies that significantly deviate from the expected rate of evolution. Multiple potential cases of neofunctionalization and subfunctionalization were identified, including some ancient, house-keeping gene families, such as ribosomal protein S10, general transcription factor TFIIB and chaperone Hsp20. As expected, many cases of apparent acceleration of evolution are associated with lineage-specific gene duplication. On other occasions, long branches in phylogenetic trees correspond to horizontal gene transfer across long evolutionary distances. Significant deceleration of evolution is less common than acceleration, and the underlying causes are not well understood; functional shifts accompanied by increased constraints could be involved. Many gene families appear to be “highly evolvable,” that is, include both long and short branches. Even in the absence of precise functional predictions, this approach allows one to select targets for experimentation in search of new biology.
Collapse
Affiliation(s)
- Celine Petitjean
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
10
|
Snir S, vonHoldt BM, Pellegrini M. A Statistical Framework to Identify Deviation from Time Linearity in Epigenetic Aging. PLoS Comput Biol 2016; 12:e1005183. [PMID: 27835646 PMCID: PMC5106012 DOI: 10.1371/journal.pcbi.1005183] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/05/2016] [Indexed: 01/09/2023] Open
Abstract
In multiple studies DNA methylation has proven to be an accurate biomarker of age. To develop these biomarkers, the methylation of multiple CpG sites is typically linearly combined to predict chronological age. By contrast, in this study we apply the Universal PaceMaker (UPM) model to investigate changes in DNA methylation during aging. The UPM was initially developed to study rate acceleration/deceleration in sequence evolution. Rather than identifying which linear combinations of sites predicts age, the UPM models the rates of change of multiple CpG sites, as well as their starting methylation levels, and estimates the age of each individual to optimize the model fit. We refer to the estimated age as the “epigenetic age”, which is in contrast to the known chronological age of each individual. We construct a statistical framework and devise an algorithm to determine whether a genomic pacemaker is in effect (i.e rates of change vary with age). The decision is made by comparing two competing likelihood based models, the molecular clock (MC) and UPM. For the molecular clock model, we use the known chronological age of each individual and fit the methylation rates at multiple sites, and express the problem as a linear least squares and solve it in polynomial time. For the UPM case, the search space is larger as we are fitting both the epigenetic age of each individual as well as the rates for each site, yet we succeed to reduce the problem to the space of individuals and polynomial in the more significant space—the methylated sites. We first tested our algorithm on simulated data to elucidate the factors affecting the identification of the pacemaker model. We find that, provided with enough data, our algorithm is capable of identifying a pacemaker even when a weak signal is present in the data. Based on these results, we applied our method to DNA methylation data from human blood from individuals of various ages. Although the improvement in variance across sites between the UPM and MC was small, the results suggest that the existence of a pacemaker is highly significant. The PaceMaker results also suggest a decay in the rate of change in DNA methylation with age. DNA methylation is an important component of the epigenetic code that defines and maintains the state of cells. Recently, it has been found that certain sites in the genome undergo methylation changes at different rates during aging. The seminal work of Steve Horvath found that the methylation of a couple hundred CpG sites could be linearly combined to accurately predict the age of an individual in a number of tissues. Such a pattern resembles the Molecular Clock (MC) concept prevailing in molecular evolution, which suggests that there are sites in the genome that change linearly with age. In this work, we adapt the Universal PaceMaker (UPM) model to the setting of DNA methylation changes during aging. UPM relaxes the rate constancy of MC and was found to provide a better statistical explanation for genome evolution across the entire tree of life. This adaptation requires the solution of a complex optimization problem. Nevertheless, in a series of observations we show that the problem can be solved efficiently under the MC model and slightly less efficiently under the UPM model. This allows us to solve problems of non-trivial size. We chose as a proof of concept to analyze DNA methylation data collected from the blood of humans of different ages. Our results show that, similarly to genome evolution, the UPM provided an improvement of about 2% in the fit to the data. The statistical significance of this improvement is very high. Although tested on a small data set, this improvement demonstrates that the UPM more accurately captures age related DNA methylation changes than the MC model.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Bridgett M. vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
11
|
Duchêne S, Foster CSP, Ho SYW. Estimating the number and assignment of clock models in analyses of multigene datasets. Bioinformatics 2016; 32:1281-5. [DOI: 10.1093/bioinformatics/btw005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 01/04/2016] [Indexed: 11/14/2022] Open
|
12
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
13
|
Duchêne S, Ho SYW. Mammalian genome evolution is governed by multiple pacemakers. ACTA ACUST UNITED AC 2015; 31:2061-5. [PMID: 25725495 DOI: 10.1093/bioinformatics/btv121] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 02/20/2015] [Indexed: 11/14/2022]
Abstract
UNLABELLED Genomic evolution is shaped by a dynamic combination of mutation, selection and genetic drift. These processes lead to evolutionary rate variation across loci and among lineages. In turn, interactions between these two forms of rate variation can produce residual effects, whereby the pattern of among-lineage rate heterogeneity varies across loci. The nature of rate variation is encapsulated in the pacemaker models of genome evolution, which differ in the degree of importance assigned to residual effects: none (Universal Pacemaker), some (Multiple Pacemaker) or total (Degenerate Multiple Pacemaker). Here we use a phylogenetic method to partition the rate variation across loci, allowing comparison of these pacemaker models. Our analysis of 431 genes from 29 mammalian taxa reveals that rate variation across these genes can be explained by 13 pacemakers, consistent with the Multiple Pacemaker model. We find no evidence that these pacemakers correspond to gene function. Our results have important consequences for understanding the factors driving genomic evolution and for molecular-clock analyses. AVAILABILITY AND IMPLEMENTATION ClockstaR-G is freely available for download from github (https://github.com/sebastianduchene/clockstarg).
Collapse
Affiliation(s)
- Sebastián Duchêne
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
14
|
Snir S. On the number of genomic pacemakers: a geometric approach. Algorithms Mol Biol 2014; 9:26. [PMID: 25648755 PMCID: PMC4301663 DOI: 10.1186/s13015-014-0026-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 11/11/2014] [Indexed: 11/13/2022] Open
Abstract
The universal pacemaker (UPM) model extends the classical molecular clock (MC) model, by allowing each gene, in addition to its individual intrinsic rate as in the MC, to accelerate or decelerate according to the universal pacemaker. Under UPM, the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the MC model. In this work we provide a natural generalization to the UPM model that we denote multiple pacemakers (MPM). Under the MPM model every gene is still affected by a single pacemaker, however the number of pacemakers is not confined to one. Such a model induces a partition over the gene set where all the genes in one part are affected by the same pacemaker and task is to identify the pacemaker partition, or in other words, finding for each gene its associated pacemaker. We devise a novel heuristic procedure, relying on statistical and geometrical tools, to solve the problem and demonstrate by simulation that this approach can cope satisfactorily with considerable noise and realistic problem sizes. We applied this procedure to a set of over 2000 genes in 100 prokaryotes and demonstrated the significant existence of two pacemakers.
Collapse
|
15
|
Ho SYW. The changing face of the molecular evolutionary clock. Trends Ecol Evol 2014; 29:496-503. [PMID: 25086668 DOI: 10.1016/j.tree.2014.07.004] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 07/03/2014] [Accepted: 07/08/2014] [Indexed: 11/30/2022]
Abstract
The molecular clock has played an important role in biological research, both as a description of the evolutionary process and as a tool for inferring evolutionary timescales. Genomic data have provided valuable insights into the molecular clock, allowing the patterns and causes of evolutionary rate variation to be characterized in increasing detail. I explain how genome sequences offer exciting opportunities for estimating the timescale of the Tree of Life. I describe the different approaches that have been used to deal with the computational and statistical challenges encountered in molecular clock analyses of genomic data. Finally, I offer a perspective on the future of molecular clocks, highlighting some of the key limitations and the most promising research directions.
Collapse
Affiliation(s)
- Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|