1
|
Little J, Chikina M, Clark NL. Evolutionary rate covariation is a reliable predictor of co-functional interactions but not necessarily physical interactions. eLife 2024; 12:RP93333. [PMID: 38415754 PMCID: PMC10942632 DOI: 10.7554/elife.93333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open
Abstract
Co-functional proteins tend to have rates of evolution that covary over time. This correlation between evolutionary rates can be measured over the branches of a phylogenetic tree through methods such as evolutionary rate covariation (ERC), and then used to construct gene networks by the identification of proteins with functional interactions. The cause of this correlation has been hypothesized to result from both compensatory coevolution at physical interfaces and nonphysical forces such as shared changes in selective pressure. This study explores whether coevolution due to compensatory mutations has a measurable effect on the ERC signal. We examined the difference in ERC signal between physically interacting protein domains within complexes compared to domains of the same proteins that do not physically interact. We found no generalizable relationship between physical interaction and high ERC, although a few complexes ranked physical interactions higher than nonphysical interactions. Therefore, we conclude that coevolution due to physical interaction is weak, but present in the signal captured by ERC, and we hypothesize that the stronger signal instead comes from selective pressures on the protein as a whole and maintenance of the general function.
Collapse
Affiliation(s)
- Jordan Little
- Department of Human Genetics, University of UtahSalt Lake CityUnited States
| | - Maria Chikina
- Department of Computational Biology, University of PittsburghPittsburghUnited States
| | - Nathan L Clark
- Department of Human Genetics, University of UtahSalt Lake CityUnited States
- Department of Biological Sciences, University of PittsburghPittsburghUnited States
| |
Collapse
|
2
|
Dimayacyac JR, Wu S, Jiang D, Pennell M. Evaluating the Performance of Widely Used Phylogenetic Models for Gene Expression Evolution. Genome Biol Evol 2023; 15:evad211. [PMID: 38000902 PMCID: PMC10709115 DOI: 10.1093/gbe/evad211] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 11/09/2023] [Accepted: 11/17/2023] [Indexed: 11/26/2023] Open
Abstract
Phylogenetic comparative methods are increasingly used to test hypotheses about the evolutionary processes that drive divergence in gene expression among species. However, it is unknown whether the distributional assumptions of phylogenetic models designed for quantitative phenotypic traits are realistic for expression data and importantly, the reliability of conclusions of phylogenetic comparative studies of gene expression may depend on whether the data is well described by the chosen model. To evaluate this, we first fit several phylogenetic models of trait evolution to 8 previously published comparative expression datasets, comprising a total of 54,774 genes with 145,927 unique gene-tissue combinations. Using a previously developed approach, we then assessed how well the best model of the set described the data in an absolute (not just relative) sense. First, we find that Ornstein-Uhlenbeck models, in which expression values are constrained around an optimum, were the preferred models for 66% of gene-tissue combinations. Second, we find that for 61% of gene-tissue combinations, the best-fit model of the set was found to perform well; the rest were found to be performing poorly by at least one of the test statistics we examined. Third, we find that when simple models do not perform well, this appears to be typically a consequence of failing to fully account for heterogeneity in the rate of the evolution. We advocate that assessment of model performance should become a routine component of phylogenetic comparative expression studies; doing so can improve the reliability of inferences and inspire the development of novel models.
Collapse
Affiliation(s)
- Jose Rafael Dimayacyac
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Shanyun Wu
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
- Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Daohan Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Matt Pennell
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
3
|
Bastide P, Soneson C, Stern DB, Lespinet O, Gallopin M. A Phylogenetic Framework to Simulate Synthetic Interspecies RNA-Seq Data. Mol Biol Evol 2023; 40:msac269. [PMID: 36508357 PMCID: PMC11249980 DOI: 10.1093/molbev/msac269] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.
Collapse
Affiliation(s)
- Paul Bastide
- IMAG, Université de Montpellier, CNRS, Montpellier, France
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - David B Stern
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Mélina Gallopin
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| |
Collapse
|
4
|
Harrison BR, Hoffman JM, Samuelson A, Raftery D, Promislow DEL. Modular Evolution of the Drosophila Metabolome. Mol Biol Evol 2022; 39:msab307. [PMID: 34662414 PMCID: PMC8760934 DOI: 10.1093/molbev/msab307] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Comparative phylogenetic studies offer a powerful approach to study the evolution of complex traits. Although much effort has been devoted to the evolution of the genome and to organismal phenotypes, until now relatively little work has been done on the evolution of the metabolome, despite the fact that it is composed of the basic structural and functional building blocks of all organisms. Here we explore variation in metabolite levels across 50 My of evolution in the genus Drosophila, employing a common garden design to measure the metabolome within and among 11 species of Drosophila. We find that both sex and age have dramatic and evolutionarily conserved effects on the metabolome. We also find substantial evidence that many metabolite pairs covary after phylogenetic correction, and that such metabolome coevolution is modular. Some of these modules are enriched for specific biochemical pathways and show different evolutionary trajectories, with some showing signs of stabilizing selection. Both observations suggest that functional relationships may ultimately cause such modularity. These coevolutionary patterns also differ between sexes and are affected by age. We explore the relevance of modular evolution to fitness by associating modules with lifespan variation measured in the same common garden. We find several modules associated with lifespan, particularly in the metabolome of older flies. Oxaloacetate levels in older females appear to coevolve with lifespan, and a lifespan-associated module in older females suggests that metabolic associations could underlie 50 My of lifespan evolution.
Collapse
Affiliation(s)
- Benjamin R Harrison
- Department of Lab Medicine & Pathology, University of Washington School of Medicine, Seattle, WA, USA
| | - Jessica M Hoffman
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ariana Samuelson
- Department of Biology, University of Washington, Seattle, WA, USA
| | - Daniel Raftery
- Department of Anesthesiology & Pain Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel E L Promislow
- Department of Lab Medicine & Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Biology, University of Washington, Seattle, WA, USA
| |
Collapse
|
5
|
Poudel S, Cope AL, O'Dell KB, Guss AM, Seo H, Trinh CT, Hettich RL. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:116. [PMID: 33971924 PMCID: PMC8112048 DOI: 10.1186/s13068-021-01964-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/26/2021] [Indexed: 05/13/2023]
Abstract
BACKGROUND Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. RESULTS We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. CONCLUSIONS This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.
Collapse
Affiliation(s)
- Suresh Poudel
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Alexander L Cope
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Kaela B O'Dell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
| | - Adam M Guss
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
| | - Hyeongmin Seo
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA
| | - Cong T Trinh
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| |
Collapse
|
6
|
Geng A, Jin M, Li N, Zhu D, Xie R, Wang Q, Lin H, Sun J. New Insights into the Co-Occurrences of Glycoside Hydrolase Genes among Prokaryotic Genomes through Network Analysis. Microorganisms 2021; 9:427. [PMID: 33669523 PMCID: PMC7922503 DOI: 10.3390/microorganisms9020427] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 02/06/2021] [Accepted: 02/14/2021] [Indexed: 12/21/2022] Open
Abstract
Glycoside hydrolase (GH) represents a crucial category of enzymes for carbohydrate utilization in most organisms. A series of glycoside hydrolase families (GHFs) have been classified, with relevant information deposited in the CAZy database. Statistical analysis indicated that most GHFs (134 out of 154) were prone to exist in bacteria rather than archaea, in terms of both occurrence frequencies and average gene numbers. Co-occurrence analysis suggested the existence of strong or moderate-strong correlations among 63 GHFs. A combination of network analysis by Gephi and functional classification among these GHFs demonstrated the presence of 12 functional categories (from group A to L), with which the corresponding microbial collections were subsequently labeled, respectively. Interestingly, a progressive enrichment of particular GHFs was found among several types of microbes, and type-L as well as type-E microbes were deemed as functional intensified species which formed during the microbial evolution process toward efficient decomposition of lignocellulose as well as pectin, respectively. Overall, integrating network analysis and enzymatic functional classification, we were able to provide a new angle of view for GHs from known prokaryotic genomes, and thus this study is likely to guide the selection of GHs and microbes for efficient biomass utilization.
Collapse
Affiliation(s)
- Alei Geng
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, Zhenjiang 212013, China; (M.J.); (N.L.); (D.Z.); (R.X.); (Q.W.); (H.L.)
| | | | | | | | | | | | | | - Jianzhong Sun
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, Zhenjiang 212013, China; (M.J.); (N.L.); (D.Z.); (R.X.); (Q.W.); (H.L.)
| |
Collapse
|
7
|
Lam TJ, Stamboulian M, Han W, Ye Y. Model-based and phylogenetically adjusted quantification of metabolic interaction between microbial species. PLoS Comput Biol 2020; 16:e1007951. [PMID: 33125363 PMCID: PMC7657538 DOI: 10.1371/journal.pcbi.1007951] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 11/11/2020] [Accepted: 09/10/2020] [Indexed: 11/18/2022] Open
Abstract
Microbial community members exhibit various forms of interactions. Taking advantage of the increasing availability of microbiome data, many computational approaches have been developed to infer bacterial interactions from the co-occurrence of microbes across diverse microbial communities. Additionally, the introduction of genome-scale metabolic models have also enabled the inference of cooperative and competitive metabolic interactions between bacterial species. By nature, phylogenetically similar microbial species are more likely to share common functional profiles or biological pathways due to their genomic similarity. Without properly factoring out the phylogenetic relationship, any estimation of the competition and cooperation between species based on functional/pathway profiles may bias downstream applications. To address these challenges, we developed a novel approach for estimating the competition and complementarity indices for a pair of microbial species, adjusted by their phylogenetic distance. An automated pipeline, PhyloMint, was implemented to construct competition and complementarity indices from genome scale metabolic models derived from microbial genomes. Application of our pipeline to 2,815 human-gut associated bacteria showed high correlation between phylogenetic distance and metabolic competition/cooperation indices among bacteria. Using a discretization approach, we were able to detect pairs of bacterial species with cooperation scores significantly higher than the average pairs of bacterial species with similar phylogenetic distances. A network community analysis of high metabolic cooperation but low competition reveals distinct modules of bacterial interactions. Our results suggest that niche differentiation plays a dominant role in microbial interactions, while habitat filtering also plays a role among certain clades of bacterial species.
Collapse
Affiliation(s)
- Tony J. Lam
- Luddy School of Informatics, Computing and Engineering Indiana University, Bloomington, IN, USA
| | - Moses Stamboulian
- Luddy School of Informatics, Computing and Engineering Indiana University, Bloomington, IN, USA
| | - Wontack Han
- Luddy School of Informatics, Computing and Engineering Indiana University, Bloomington, IN, USA
| | - Yuzhen Ye
- Luddy School of Informatics, Computing and Engineering Indiana University, Bloomington, IN, USA
| |
Collapse
|