1
|
Lu J, Cao X, Zhong S. A likelihood approach to testing hypotheses on the co-evolution of epigenome and genome. PLoS Comput Biol 2018; 14:e1006673. [PMID: 30586383 PMCID: PMC6324829 DOI: 10.1371/journal.pcbi.1006673] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 01/08/2019] [Accepted: 11/26/2018] [Indexed: 01/03/2023] Open
Abstract
Central questions to epigenome evolution include whether interspecies changes of histone modifications are independent of evolutionary changes of DNA, and if there is dependence whether they depend on any specific types of DNA sequence changes. Here, we present a likelihood approach for testing hypotheses on the co-evolution of genome and histone modifications. The gist of this approach is to convert evolutionary biology hypotheses into probabilistic forms, by explicitly expressing the joint probability of multispecies DNA sequences and histone modifications, which we refer to as a class of Joint Evolutionary Model for the Genome and the Epigenome (JEMGE). JEMGE can be summarized as a mixture model of four components representing four evolutionary hypotheses, namely dependence and independence of interspecies epigenomic variations to underlying sequence substitutions and to underlying sequence insertions and deletions (indels). We implemented a maximum likelihood method to fit the models to the data. Based on comparison of likelihoods, we inferred whether interspecies epigenomic variations depended on substitution or indels in local genomic sequences based on DNase hypersensitivity and spermatid H3K4me3 ChIP-seq data from human and rhesus macaque. Approximately 5.5% of homologous regions in the genomes exhibited H3K4me3 modification in either species, among which approximately 67% homologous regions exhibited local-sequence-dependent interspecies H3K4me3 variations. Substitutions accounted for less local-sequence-dependent H3K4me3 variations than indels. Among transposon-mediated indels, ERV1 insertions and L1 insertions were most strongly associated with H3K4me3 gains and losses, respectively. By initiating probabilistic formulation on the co-evolution of genomes and epigenomes, JEMGE helps to bring evolutionary biology principles to comparative epigenomic studies. Epigenetic modifications play a significant role in gene regulations and thus heavily influence phenotypic outcomes. Whereas cross-species epigenomic comparisons have been fruitful in revealing the function of epigenetic modifications, it still remains unclear how the epigenome changes across species. A central question in epigenome evolution studies is whether interspecies epigenomic variations rely on genomic changes in cis and, if partially yes, whether different genomic changes have distinct impacts. To tackle this question, we initiated a likelihood-based approach, in which different hypotheses related to the co-evolution of the genome and the epigenome could be converted into probabilistic models. By fitting the models to actual data, each model yielded a likelihood, and the hypothesis corresponded to the largest likelihood was selected as most supported by observed data. In this work, we focused on the influence of two types of underlying sequence changes: substitutions, and insertions and deletions (indels). We quantitatively assessed the dependence of H3K4me3 variations on substitutions and indels between human and rhesus, and separated their relative impacts within each genomic region with H3K4me3. The methodology presented here provides a framework for modeling the epigenome together with the genome and a quantitative approach to test different evolutionary hypotheses.
Collapse
Affiliation(s)
- Jia Lu
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Xiaoyi Cao
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Sheng Zhong
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
2
|
Koch C, Konieczka J, Delorey T, Lyons A, Socha A, Davis K, Knaack SA, Thompson D, O'Shea EK, Regev A, Roy S. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies. Cell Syst 2017; 4:543-558.e8. [PMID: 28544882 PMCID: PMC5515301 DOI: 10.1016/j.cels.2017.04.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 02/20/2017] [Accepted: 04/26/2017] [Indexed: 11/22/2022]
Abstract
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species.
Collapse
Affiliation(s)
- Christopher Koch
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wl, USA
| | - Jay Konieczka
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toni Delorey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ana Lyons
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Amanda Socha
- Dartmouth College, Biology department, Hanover, NH 03755, USA
| | - Kathleen Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Sara A Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Erin K O'Shea
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Department of Molecular and Cellular Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wl, USA
| |
Collapse
|
3
|
Roy S, Sridharan R. Chromatin module inference on cellular trajectories identifies key transition points and poised epigenetic states in diverse developmental processes. Genome Res 2017; 27:1250-1262. [PMID: 28424352 PMCID: PMC5495076 DOI: 10.1101/gr.215004.116] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Accepted: 04/12/2017] [Indexed: 12/13/2022]
Abstract
Changes in chromatin state play important roles in cell fate transitions. Current computational approaches to analyze chromatin modifications across multiple cell types do not model how the cell types are related on a lineage or over time. To overcome this limitation, we developed a method called Chromatin Module INference on Trees (CMINT), a probabilistic clustering approach to systematically capture chromatin state dynamics across multiple cell types. Compared to existing approaches, CMINT can handle complex lineage topologies, capture higher quality clusters, and reliably detect chromatin transitions between cell types. We applied CMINT to gain novel insights in two complex processes: reprogramming to induced pluripotent stem cells (iPSCs) and hematopoiesis. In reprogramming, chromatin changes could occur without large gene expression changes, different combinations of activating marks were associated with specific reprogramming factors, there was an order of acquisition of chromatin marks at pluripotency loci, and multivalent states (comprising previously undetermined combinations of activating and repressive histone modifications) were enriched for CTCF. In the hematopoietic system, we defined critical decision points in the lineage tree, identified regulatory elements that were enriched in cell-type–specific regions, and found that the underlying chromatin state was achieved by specific erasure of preexisting chromatin marks in the precursor cell or by de novo assembly. Our method provides a systematic approach to model the dynamics of chromatin state to provide novel insights into the relationships among cell types in diverse cell-fate specification processes.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA.,Wisconsin Institute for Discovery, Madison, Wisconsin 53715, USA
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, Madison, Wisconsin 53715, USA.,Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA
| |
Collapse
|
4
|
Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015; 31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Collapse
Affiliation(s)
- Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | | | | |
Collapse
|
5
|
Xiao S, Cao X, Zhong S. Comparative epigenomics: defining and utilizing epigenomic variations across species, time-course, and individuals. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2014; 6:345-52. [PMID: 25044241 DOI: 10.1002/wsbm.1274] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2014] [Revised: 05/02/2014] [Accepted: 06/16/2014] [Indexed: 12/20/2022]
Abstract
UNLABELLED Epigenomic profiling, by revealing genome-wide distributions of epigenetic modifications, generated a large amount of structural information about the chromosomes. Epigenomic analysis has quickly become a big data science, posing tremendous challenges on its translation into knowledge. To meet this challenge, comparative analysis of epigenomes, dubbed comparative epigenomics, has emerged as an active research area. Here, we summarize the recent developments in comparative epigenomic analyses into three major directions, namely the comparisons across species, the time-course of a biological process, and individuals. We review the main ideas, methods, and findings in each direction, and discuss the implications to understanding the regulatory functions of the genomes. CONFLICT OF INTEREST The authors have declared no conflicts of interest for this article.
Collapse
Affiliation(s)
- Shu Xiao
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | | | | |
Collapse
|
6
|
Bykova NA, Favorov AV, Mironov AA. Hidden Markov models for evolution and comparative genomics analysis. PLoS One 2013; 8:e65012. [PMID: 23762278 PMCID: PMC3676395 DOI: 10.1371/journal.pone.0065012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 04/23/2013] [Indexed: 12/21/2022] Open
Abstract
The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to account for a case when the states of the extant species are uncertain. This situation appears, for example, if the states for extant species are predicted by some program and thus are known only with some level of reliability; it is common for bioinformatics field. The main idea is formulation of the problem as a hidden Markov model on a tree (tree HMM, tHMM), where the basic continuous-time Markov model is expanded with the introduction of emission probabilities of observed data (e.g. prediction scores) for each underlying discrete state. Our tHMM decoding algorithm allows us to predict states at the ancestral nodes as well as to refine states at the leaves on the basis of quantitative comparative genomics. The test on the simulated data shows that the tHMM approach applied to the continuous variable reflecting the probabilities of the states (i.e. prediction score) appears to be more accurate then the reconstruction from the discrete states assignment defined by the best score threshold. We provide examples of applying our model to the evolutionary analysis of N-terminal signal peptides and transcription factor binding sites in bacteria. The program is freely available at http://bioinf.fbb.msu.ru/~nadya/tHMM and via web-service at http://bioinf.fbb.msu.ru/treehmmweb.
Collapse
Affiliation(s)
- Nadezda A Bykova
- A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia.
| | | | | |
Collapse
|
7
|
Feuer R, Gottlieb K, Viertel G, Klotz J, Schober S, Bossert M, Sawodny O, Sprenger G, Ederer M. Model-based analysis of an adaptive evolution experiment with Escherichia coli in a pyruvate limited continuous culture with glycerol. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2012; 2012:14. [PMID: 23033959 PMCID: PMC3534542 DOI: 10.1186/1687-4153-2012-14] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2012] [Accepted: 09/10/2012] [Indexed: 12/14/2022]
Abstract
Bacterial strains that were genetically blocked in important metabolic pathways and grown under selective conditions underwent a process of adaptive evolution: certain pathways may have been deregulated and therefore allowed for the circumvention of the given block. A block of endogenous pyruvate synthesis from glycerol was realized by a knockout of pyruvate kinase and phosphoenolpyruvate carboxylase in E. coli. The resulting mutant strain was able to grow on a medium containing glycerol and lactate, which served as an exogenous pyruvate source. Heterologous expression of a pyruvate carboxylase gene from Corynebacterium glutamicum was used for anaplerosis of the TCA cycle. Selective conditions were controlled in a continuous culture with limited lactate feed and an excess of glycerol feed. After 200–300 generations pyruvate-prototrophic mutants were isolated. The genomic analysis of an evolved strain revealed that the genotypic basis for the regained pyruvate-prototrophy was not obvious. A constraint-based model of the metabolism was employed to compute all possible detours around the given metabolic block by solving a hierarchy of linear programming problems. The regulatory network was expected to be responsible for the adaptation process. Hence, a Boolean model of the transcription factor network was connected to the metabolic model. Our model analysis only showed a marginal impact of transcriptional control on the biomass yield on substrate which is a key variable in the selection process. In our experiment, microarray analysis confirmed that transcriptional control probably played a minor role in the deregulation of the alternative pathways for the circumvention of the block.
Collapse
Affiliation(s)
- Ronny Feuer
- Institute for System Dynamics, University of Stuttgart, Pfaffenwaldring 9, 70569 Stuttgart, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Enard W. Functional primate genomics—leveraging the medical potential. J Mol Med (Berl) 2012; 90:471-80. [DOI: 10.1007/s00109-012-0901-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 04/04/2012] [Accepted: 04/05/2012] [Indexed: 10/28/2022]
|