1
|
Dennler O, Ryan CJ. Evaluating sequence and structural similarity metrics for predicting shared paralog functions. NAR Genom Bioinform 2025; 7:lqaf051. [PMID: 40290317 PMCID: PMC12034104 DOI: 10.1093/nargab/lqaf051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 03/07/2025] [Accepted: 04/15/2025] [Indexed: 04/30/2025] Open
Abstract
Gene duplication is the primary source of new genes, resulting in most genes having identifiable paralogs. Over time, paralog pairs may diverge in some respects but many retain the ability to perform the same functional role. Protein sequence identity is often used as a proxy for functional similarity and can predict shared functions between paralogs as revealed by synthetic lethal experiments. However, the advent of alternative protein representations, including embeddings from protein language models (PLMs) and predicted structures from AlphaFold, raises the possibility that alternative similarity metrics could better capture functional similarity between paralogs. Here, using two species (budding yeast and human) and two different definitions of shared functionality (shared protein-protein interactions and synthetic lethality), we evaluated a variety of alternative similarity metrics. For some tasks, predicted structural similarity or PLM similarity outperform sequence identity, but more importantly these similarity metrics are not redundant with sequence identity, i.e. combining them with sequence identity leads to improved predictions of shared functionality. By adding contextual features, representing similarity to homologous proteins within and across species, we can significantly enhance our predictions of shared paralog functionality. Overall, our results suggest that alternative similarity metrics capture complementary aspects of functional similarity beyond sequence identity alone.
Collapse
Affiliation(s)
- Olivier Dennler
- School of Medicine, University College Dublin, Dublin 4, D04 V1W8, Ireland
- School of Computer Science, University College Dublin, Dublin 4, D04 V1W8, Ireland
- Conway Institute, University College Dublin, Dublin 4, D04 V1W8, Ireland
| | - Colm J Ryan
- School of Medicine, University College Dublin, Dublin 4, D04 V1W8, Ireland
- School of Computer Science, University College Dublin, Dublin 4, D04 V1W8, Ireland
- Conway Institute, University College Dublin, Dublin 4, D04 V1W8, Ireland
| |
Collapse
|
2
|
Alvarez-Ponce D, Krishnamurthy S. Organismal complexity strongly correlates with the number of protein families and domains. Proc Natl Acad Sci U S A 2025; 122:e2404332122. [PMID: 39874285 PMCID: PMC11804679 DOI: 10.1073/pnas.2404332122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 12/25/2024] [Indexed: 01/30/2025] Open
Abstract
In the pregenomic era, scientists were puzzled by the observation that haploid genome size (the C-value) did not correlate well with organismal complexity. This phenomenon, called the "C-value paradox," is mostly explained by the fact that protein-coding genes occupy only a small fraction of eukaryotic genomes. When the first genome sequences became available, scientists were even more surprised by the fact that the number of genes (G-value) was also a poor predictor of complexity, which gave rise to the "G-value paradox." The proposed explanations usually invoke mechanisms that increase the information content of each individual gene (protein-protein interactions, intrinsic disorder, posttranslational modifications, alternative splicing, etc.). Less attention has been paid to mechanisms that increase the amount of genetic material but do not increase (or not to the same extent) the amount of information encoded in the genome, such as gene duplication and domain shuffling. Proteins belonging to the same family and/or sharing the same domains often carry out similar or even redundant functions. We thus hypothesized that an organism's number of different protein families and domains should be suitable predictors of organismal complexity. In agreement with our hypothesis, we observed that the number of protein families, clans, domains, and motifs increases from simple to progressively more complex organisms. In addition, these metrics correlate with the number of cell types better than and independently of the number of protein-coding genes and several previously proposed predictors of organismal complexity. Our observations have the potential to represent a resolution to the G-value paradox.
Collapse
Affiliation(s)
| | - Subramanian Krishnamurthy
- Duncan and Nancy MacMillan Cancer Immunology and Metabolism Center of Excellence, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ08901
| |
Collapse
|
3
|
Prabakaran R, Bromberg Y. Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools. Bioinformatics 2025; 41:btaf035. [PMID: 39854283 PMCID: PMC11821270 DOI: 10.1093/bioinformatics/btaf035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 11/04/2024] [Accepted: 01/22/2025] [Indexed: 01/26/2025] Open
Abstract
MOTIVATION In silico functional annotation of proteins is crucial to narrowing the sequencing-accelerated gap in our understanding of protein activities. Numerous function annotation methods exist, and their ranks have been growing, particularly so with the recent deep learning-based developments. However, it is unclear if these tools are truly predictive. As we are not aware of any methods that can identify new terms in functional ontologies, we ask if they can, at least, identify molecular functions of proteins that are non-homologous to or far-removed from known protein families. RESULTS Here, we explore the potential and limitations of the existing methods in predicting the molecular functions of thousands of such proteins. Lacking the "ground truth" functional annotations, we transformed the assessment of function prediction into evaluation of functional similarity of protein pairs that likely share function but are unlike any of the currently functionally annotated sequences. Notably, our approach transcends the limitations of functional annotation vocabularies, providing a means to assess different-ontology annotation methods. We find that most existing methods are limited to identifying functional similarity of homologous sequences and fail to predict the function of proteins lacking reference. Curiously, despite their seemingly unlimited by-homology scope, deep learning methods also have trouble capturing the functional signal encoded in protein sequence. We believe that our work will inspire the development of a new generation of methods that push boundaries and promote exploration and discovery in the molecular function domain. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available at https://doi.org/10.6084/m9.figshare.c.6737127.v3. The code used to compute siblings is available openly at https://bitbucket.org/bromberglab/siblings-detector/.
Collapse
Affiliation(s)
- R Prabakaran
- Department of Biology, Emory University, Atlanta, GA 30322, United States
- Department of Computer Science, Emory University, Atlanta, GA 30322, United States
| | - Yana Bromberg
- Department of Biology, Emory University, Atlanta, GA 30322, United States
- Department of Computer Science, Emory University, Atlanta, GA 30322, United States
| |
Collapse
|
4
|
Yang W, Ji J, Fang G. A metric and its derived protein network for evaluation of ortholog database inconsistency. BMC Bioinformatics 2025; 26:6. [PMID: 39773281 PMCID: PMC11707888 DOI: 10.1186/s12859-024-06023-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 12/24/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Ortholog prediction, essential for various genomic research areas, faces growing inconsistencies amidst the expanding array of ortholog databases. The common strategy of computing consensus orthologs introduces additional arbitrariness, emphasizing the need to examine the causes of such inconsistencies and identify proteins susceptible to prediction errors. RESULTS We introduce the Signal Jaccard Index (SJI), a novel metric rooted in unsupervised genome context clustering, designed to assess protein similarity. Leveraging SJI, we construct a protein network and reveal that peripheral proteins within the network are the primary contributors to inconsistencies in orthology predictions. Furthermore, we show that a protein's degree centrality in the network serves as a strong predictor of its reliability in consensus sets. CONCLUSIONS We present an objective, unsupervised SJI-based network encompassing all proteins, in which its topological features elucidate ortholog prediction inconsistencies. The degree centrality (DC) effectively identifies error-prone orthology assignments without relying on arbitrary parameters. Notably, DC is stable, unaffected by species selection, and well-suited for ortholog benchmarking. This approach transcends the limitations of universal thresholds, offering a robust and quantitative framework to explore protein evolution and functional relationships.
Collapse
Affiliation(s)
- Weijie Yang
- NYU-Shanghai, Shanghai, 200120, China
- Software Engineering Institute, East China Normal University, Shanghai, 200062, China
| | - Jingsi Ji
- NYU-Shanghai, Shanghai, 200120, China
- Software Engineering Institute, East China Normal University, Shanghai, 200062, China
| | - Gang Fang
- NYU-Shanghai, Shanghai, 200120, China.
- Department of Biology, New York University, New York, NY, 10003, USA.
- Software Engineering Institute, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
5
|
Liao IT, Sears KE, Hileman LC, Nikolov LA. Different orthology inference algorithms generate similar predicted orthogroups among Brassicaceae species. APPLICATIONS IN PLANT SCIENCES 2025; 13:e11627. [PMID: 39906489 PMCID: PMC11788906 DOI: 10.1002/aps3.11627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 08/25/2024] [Accepted: 09/07/2024] [Indexed: 02/06/2025]
Abstract
Premise Orthology inference is crucial for comparative genomics, and multiple algorithms have been developed to identify putative orthologs for downstream analyses. Despite the abundance of proposed solutions, including publicly available benchmarks, it is difficult to assess which tool is most suitable for plant species, which commonly have complex genomic histories. Methods We explored the performance of four orthology inference algorithms-OrthoFinder, SonicParanoid, Broccoli, and OrthNet-on eight Brassicaceae genomes in two groups: one group comprising only diploids and another set comprising the diploids, two mesopolyploids, and one recent hexaploid genome. Results The composition of the orthogroups reflected the species' ploidy and genomic histories, with the diploid set having a higher proportion of identical orthogroups. While the diploid + higher ploidy set had a lower proportion of orthogroups with identical compositions, the average degree of similarity between the orthogroups was not different from the diploid set. Discussion Three algorithms-OrthoFinder, SonicParanoid, and Broccoli-are helpful for initial orthology predictions. Results produced using OrthNet were generally outliers but could still provide detailed information about gene colinearity. With our Brassicaceae dataset, slight discrepancies were found across the orthology inference algorithms, necessitating additional analyses such as tree inference to fine-tune results.
Collapse
Affiliation(s)
- Irene T. Liao
- Department of Molecular, Cell, and Development BiologyUniversity of California – Los AngelesLos AngelesCaliforniaUSA
| | - Karen E. Sears
- Department of Molecular, Cell, and Development BiologyUniversity of California – Los AngelesLos AngelesCaliforniaUSA
- Department of Ecology and Evolutionary BiologyUniversity of California – Los AngelesLos AngelesCaliforniaUSA
| | - Lena C. Hileman
- Department of Ecology and Evolutionary BiologyUniversity of KansasLawrenceKansasUSA
| | | |
Collapse
|
6
|
Sarygina E, Kliuchnikova A, Tarbeeva S, Ilgisonis E, Ponomarenko E. Model Organisms in Aging Research: Evolution of Database Annotation and Ortholog Discovery. Genes (Basel) 2024; 16:8. [PMID: 39858555 PMCID: PMC11765380 DOI: 10.3390/genes16010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 12/14/2024] [Accepted: 12/16/2024] [Indexed: 01/27/2025] Open
Abstract
BACKGROUND This study aims to analyze the exploration degree of popular model organisms by utilizing annotations from the UniProtKB (Swiss-Prot) knowledge base. The research focuses on understanding the genomic and post-genomic data of various organisms, particularly in relation to aging as an integral model for studying the molecular mechanisms underlying pathological processes and physiological states. METHODS Having characterized the organisms by selected parameters (numbers of gene splice variants, post-translational modifications, etc.) using previously developed information models, we calculated proteome sizes: the number of possible proteoforms for each species. Our analysis also involved searching for orthologs of human aging genes within these model species. RESULTS Our findings indicate that genomic and post-genomic data for more primitive species, such as bacteria and fungi, are more comprehensively characterized compared to other organisms. This is attributed to their experimental accessibility and simplicity. Additionally, we discovered that the genomes of the most studied model organisms allow for a detailed analysis of the aging process, revealing a greater number of orthologous genes related to aging. CONCLUSIONS The results highlight the importance of annotating the genomes of less-studied species to identify orthologs of marker genes associated with complex physiological processes, including aging. Species that potentially possess unique traits associated with longevity and resilience to age-related changes require comprehensive genomic studies.
Collapse
Affiliation(s)
| | | | | | - Ekaterina Ilgisonis
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (E.S.); (A.K.); (S.T.)
| | | |
Collapse
|
7
|
Kuvaeva EE, Cherezov RO, Kulikova DA, Mertsalov IB. The Drosophila toothrin Gene Related to the d4 Family Genes: An Evolutionary View on Origin and Function. Int J Mol Sci 2024; 25:13394. [PMID: 39769157 PMCID: PMC11678306 DOI: 10.3390/ijms252413394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 12/05/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open
Abstract
D. melanogaster has two paralogs, tth and dd4, related to the evolutionarily conserved d4 family genes. In mammals, the family consists of Dpf1-3, encoding transcription co-factors involved in the regulation of development and cell fate determination. The function of tth and dd4 in Drosophila remains unclear. The typical domain structure of the proteins encoded by the d4 family consists of an N-terminal 2/3 domain (Requiem_N), a central Kruppel-type zinc finger, and a C-terminal D4 domain of paired PHD zinc fingers (DPFs). In Drosophila, both paralogs lack the Kruppel-type ZF, and tth encodes a protein that contains only Requiem_N. In contrast, vertebrate Dpf1-3 paralogs encode all the domains, but some paralogs have specific splice isoforms. For example, the DPF3a isoform lacks the D4 domain necessary for histone reading. The occurrence of proteins without the D4 domain in mammals and flies implies functional significance and analogous roles across animal taxa. In this study, we reconstructed the evolutionary events that led to the emergence of Drosophila tth by analyzing the divergence of d4 paralogs across different evolutionary lineages. Our genomic and transcriptomic data analysis revealed duplications and gene copy loss events. Among insects, gene duplication was only observed in Diptera. In other lineages, we found the specialization of paralogs for producing isoforms and further specialization for coding proteins with specific domain organizations. We hypothesize that this pathway is a common mechanism for the emergence of paralogues lacking the D4 domain across different evolutionary lineages. We, thus, postulate that TTH may function as a splice isoform of the ancestral single-copy gene, possibly a DPF3a-like isoform characteristic of related insect species. Our analysis provides insights into the possible impact of paralogue divergence, emphasizing the functional significance of the 2/3 domain and the potential roles of isoforms lacking the D4 domain.
Collapse
Affiliation(s)
| | | | | | - Ilya B. Mertsalov
- Koltzov Institute of Developmental Biology of Russian Academy of Sciences, 26 Vavilov Street, 119334 Moscow, Russia; (E.E.K.); (R.O.C.); (D.A.K.)
| |
Collapse
|
8
|
Wong KH, Rodriguez NA, Traylor-Knowles N. Exploring the Unknown: How Can We Improve Single-cell RNAseq Cell Type Annotations in Non-model Organisms? Integr Comp Biol 2024; 64:1291-1299. [PMID: 39013613 DOI: 10.1093/icb/icae112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 07/05/2024] [Accepted: 07/08/2024] [Indexed: 07/18/2024] Open
Abstract
Single-cell RNA sequencing (scRNAseq) is a powerful tool to describe cell types in multicellular organisms across the animal kingdom. In standard scRNAseq analysis pipelines, clusters of cells with similar transcriptional signatures are given cell type labels based on marker genes that infer specialized known characteristics. Since these analyses are designed for model organisms, such as humans and mice, problems arise when attempting to label cell types of distantly related, non-model species that have unique or divergent cell types. Consequently, this leads to limited discovery of novel species-specific cell types and potential mis-annotation of cell types in non-model species while using scRNAseq. To address this problem, we discuss recently published approaches that help annotate scRNAseq clusters for any non-model organism. We first suggest that annotating with an evolutionary context of cell lineages will aid in the discovery of novel cell types and provide a marker-free approach to compare cell types across distantly related species. Secondly, machine learning has greatly improved bioinformatic analyses, so we highlight some open-source programs that use reference-free approaches to annotate cell clusters. Lastly, we propose the use of unannotated genes as potential cell markers for non-model organisms, as many do not have fully annotated genomes and these data are often disregarded. Improving single-cell annotations will aid the discovery of novel cell types and enhance our understanding of non-model organisms at a cellular level. By unifying approaches to annotate cell types in non-model organisms, we can increase the confidence of cell annotation label transfer and the flexibility to discover novel cell types.
Collapse
Affiliation(s)
- Kevin H Wong
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA, 33149
| | - Natalia Andrade Rodriguez
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA, 33149
| | - Nikki Traylor-Knowles
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA, 33149
| |
Collapse
|
9
|
Langleib M, Calvelo J, Costábile A, Castillo E, Tort JF, Hoffmann FG, Protasio AV, Koziol U, Iriarte A. Evolutionary analysis of species-specific duplications in flatworm genomes. Mol Phylogenet Evol 2024; 199:108141. [PMID: 38964593 DOI: 10.1016/j.ympev.2024.108141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 06/15/2024] [Accepted: 07/01/2024] [Indexed: 07/06/2024]
Abstract
Platyhelminthes, also known as flatworms, is a phylum of bilaterian invertebrates infamous for their parasitic representatives. The classes Cestoda, Monogenea, and Trematoda comprise parasitic helminths inhabiting multiple hosts, including fishes, humans, and livestock, and are responsible for considerable economic damage and burden on human health. As in other animals, the genomes of flatworms have a wide variety of paralogs, genes related via duplication, whose origins could be mapped throughout the evolution of the phylum. Through in-silico analysis, we studied inparalogs, i.e., species-specific duplications, focusing on their biological functions, expression changes, and evolutionary rate. These genes are thought to be key players in the adaptation process of species to each particular niche. Our results showed that genes related with specific functional terms, such as response to stress, transferase activity, oxidoreductase activity, and peptidases, are overrepresented among inparalogs. This trend is conserved among species from different classes, including free-living species. Available expression data from Schistosoma mansoni, a parasite from the trematode class, demonstrated high conservation of expression patterns between inparalogs, but with notable exceptions, which also display evidence of rapid evolution. We discuss how natural selection may operate to maintain these genes and the particular duplication models that fit better to the observations. Our work supports the critical role of gene duplication in the evolution of flatworms, representing the first study of inparalogs evolution at the genome-wide level in this group.
Collapse
Affiliation(s)
- Mauricio Langleib
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay; Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Javier Calvelo
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Alicia Costábile
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Estela Castillo
- Laboratorio de Biología Parasitaria, Instituto de Higiene, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - José F Tort
- Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Mississippi, United States of America; Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi, United States of America
| | - Anna V Protasio
- Department of Pathology, University of Cambridge, Tennis Court Road, CB2 1QP, Cambridge, United Kingdom
| | - Uriel Koziol
- Sección Biología Celular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
10
|
Barrios-Núñez I, Martínez-Redondo G, Medina-Burgos P, Cases I, Fernández R, Rojas A. Decoding functional proteome information in model organisms using protein language models. NAR Genom Bioinform 2024; 6:lqae078. [PMID: 38962255 PMCID: PMC11217674 DOI: 10.1093/nargab/lqae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/31/2024] [Accepted: 06/26/2024] [Indexed: 07/05/2024] Open
Abstract
Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use.
Collapse
Affiliation(s)
- Israel Barrios-Núñez
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | | | - Patricia Medina-Burgos
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | - Ildefonso Cases
- Bioinformatics Unit, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | - Rosa Fernández
- Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain
| | - Ana M Rojas
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| |
Collapse
|
11
|
Yuan H, Mancuso CA, Johnson K, Braasch I, Krishnan A. Computational strategies for cross-species knowledge transfer and translational biomedicine. ARXIV 2024:arXiv:2408.08503v1. [PMID: 39184546 PMCID: PMC11343225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Research organisms provide invaluable insights into human biology and diseases, serving as essential tools for functional experiments, disease modeling, and drug testing. However, evolutionary divergence between humans and research organisms hinders effective knowledge transfer across species. Here, we review state-of-the-art methods for computationally transferring knowledge across species, primarily focusing on methods that utilize transcriptome data and/or molecular networks. We introduce the term "agnology" to describe the functional equivalence of molecular components regardless of evolutionary origin, as this concept is becoming pervasive in integrative data-driven models where the role of evolutionary origin can become unclear. Our review addresses four key areas of information and knowledge transfer across species: (1) transferring disease and gene annotation knowledge, (2) identifying agnologous molecular components, (3) inferring equivalent perturbed genes or gene sets, and (4) identifying agnologous cell types. We conclude with an outlook on future directions and several key challenges that remain in cross-species knowledge transfer.
Collapse
Affiliation(s)
- Hao Yuan
- Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Christopher A. Mancuso
- Department of Biostatistics & Informatics, University of Colorado Anschutz Medical Campus
| | - Kayla Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| | - Ingo Braasch
- Department of Integrative Biology; Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| |
Collapse
|
12
|
Vandepoele K, Thierens S, Van Bel M. Application of orthology and network biology to infer gene functions in non-model plants. PHYSIOLOGIA PLANTARUM 2024; 176:e14441. [PMID: 39019770 DOI: 10.1111/ppl.14441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 07/19/2024]
Abstract
Approximately 60% of the genes and gene products in the model species Arabidopsis thaliana have been functionally characterized. In non-model plant species, the functional annotation of the gene space is largely based on homology, with the assumption that genes with shared common ancestry have conserved functions. However, the wide variety in possible morphological, physiological, and ecological differences between plant species gives rise to many species- and clade-specific genes, for which this transfer of knowledge is not possible. Other complications, such as difficulties with genetic transformation, the absence of large-scale mutagenesis methods, and long generation times, further lead to the slow characterization of genes in non-model species. Here, we discuss different resources that integrate plant gene function information. Different approaches that support the functional annotation of gene products, based on orthology or network biology, are described. While sequence-based tools to characterize the functional landscape in non-model species are maturing and becoming more readily available, easy-to-use network-based methods inferring plant gene functions are not as prevalent and have limited functionality.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- VIB Center for AI & Computational Biology, VIB, Ghent, Belgium
| | - Sander Thierens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| |
Collapse
|
13
|
Sierra NC, Olsman N, Yi L, Pachter L, Goentoro L, Gold DA. A Novel Approach to Comparative RNA-Seq Does Not Support a Conserved Set of Orthologs Underlying Animal Regeneration. Genome Biol Evol 2024; 16:evae120. [PMID: 38922665 PMCID: PMC11214158 DOI: 10.1093/gbe/evae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 05/23/2024] [Accepted: 06/05/2024] [Indexed: 06/27/2024] Open
Abstract
Molecular studies of animal regeneration typically focus on conserved genes and signaling pathways that underlie morphogenesis. To date, a holistic analysis of gene expression across animals has not been attempted, as it presents a suite of problems related to differences in experimental design and gene homology. By combining orthology analyses with a novel statistical method for testing gene enrichment across large data sets, we are able to test whether tissue regeneration across animals shares transcriptional regulation. We applied this method to a meta-analysis of six publicly available RNA-Seq data sets from diverse examples of animal regeneration. We recovered 160 conserved orthologous gene clusters, which are enriched in structural genes as opposed to those regulating morphogenesis. A breakdown of gene presence/absence provides limited support for the conservation of pathways typically implicated in regeneration, such as Wnt signaling and cell pluripotency pathways. Such pathways are only conserved if we permit large amounts of paralog switching through evolution. Overall, our analysis does not support the hypothesis that a shared set of ancestral genes underlie regeneration mechanisms in animals. After applying the same method to heat shock studies and getting similar results, we raise broader questions about the ability of comparative RNA-Seq to reveal conserved gene pathways across deep evolutionary relationships.
Collapse
Affiliation(s)
- Noémie C Sierra
- Department of Earth and Planetary Sciences, University of California, Davis, Davis, CA 95616, USA
| | - Noah Olsman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lynn Yi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lea Goentoro
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - David A Gold
- Department of Earth and Planetary Sciences, University of California, Davis, Davis, CA 95616, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
14
|
Church SH, Mah JL, Dunn CW. Integrating phylogenies into single-cell RNA sequencing analysis allows comparisons across species, genes, and cells. PLoS Biol 2024; 22:e3002633. [PMID: 38787797 PMCID: PMC11125556 DOI: 10.1371/journal.pbio.3002633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024] Open
Abstract
Comparisons of single-cell RNA sequencing (scRNA-seq) data across species can reveal links between cellular gene expression and the evolution of cell functions, features, and phenotypes. These comparisons evoke evolutionary histories, as depicted by phylogenetic trees, that define relationships between species, genes, and cells. This Essay considers each of these in turn, laying out challenges and solutions derived from a phylogenetic comparative approach and relating these solutions to previously proposed methods for the pairwise alignment of cellular dimensional maps. This Essay contends that species trees, gene trees, cell phylogenies, and cell lineages can all be reconciled as descriptions of the same concept-the tree of cellular life. By integrating phylogenetic approaches into scRNA-seq analyses, challenges for building informed comparisons across species can be overcome, and hypotheses about gene and cell evolution can be robustly tested.
Collapse
Affiliation(s)
- Samuel H. Church
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Jasmine L. Mah
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Casey W. Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
15
|
Bohutínská M, Peichel CL. Divergence time shapes gene reuse during repeated adaptation. Trends Ecol Evol 2024; 39:396-407. [PMID: 38155043 DOI: 10.1016/j.tree.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/30/2023]
Abstract
When diverse lineages repeatedly adapt to similar environmental challenges, the extent to which the same genes are involved (gene reuse) varies across systems. We propose that divergence time among lineages is a key factor driving this variability: as lineages diverge, the extent of gene reuse should decrease due to reductions in allele sharing, functional differentiation among genes, and restructuring of genome architecture. Indeed, we show that many genomic studies of repeated adaptation find that more recently diverged lineages exhibit higher gene reuse during repeated adaptation, but the relationship becomes less clear at older divergence time scales. Thus, future research should explore the factors shaping gene reuse and their interplay across broad divergence time scales for a deeper understanding of evolutionary repeatability.
Collapse
Affiliation(s)
- Magdalena Bohutínská
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland; Department of Botany, Faculty of Science, Charles University, Prague, 12800, Czech Republic.
| | - Catherine L Peichel
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland
| |
Collapse
|
16
|
Haag F, Frey T, Ball L, Hoffmann S, Krautwurst D. Petrol Note in Riesling - 1,1,6-Trimethyl-1,2-dihydronaphthalene (TDN) Selectively Activates Human Odorant Receptor OR8H1. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:4888-4896. [PMID: 38394621 PMCID: PMC10921549 DOI: 10.1021/acs.jafc.3c08230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/30/2024] [Accepted: 02/01/2024] [Indexed: 02/25/2024]
Abstract
Grapevine (Vitis vinifera) is one of the most important perennial fruit plants. The variety Riesling stands out by developing a characteristic petrol-like odor note during aging, elicited by the aroma compound 1,1,6-trimethyl-1,2-dihydronaphthalene (TDN). The UV-dependent TDN contents differ largely among Rieslings grown in the northern versus the southern hemisphere. Highest TDN concentrations were found in Australian Rieslings, where TDN is a scoring ingredient. In contrast, in Rieslings from Europe, for example, TDN may be a tending cause of rejection. A human receptor for TDN has been unknown. Here, we report on the identification of OR8H1 as a TDN-selective odorant receptor, out of a library of 766 odorant receptor variants. OR8H1 is selectively tuned to six carbon ring structures, identified by screening a collection of 180 key food odorants, using a HEK-293 cell-based cAMP luminescence assay equipped with the GloSensor technology.
Collapse
Affiliation(s)
- Franziska Haag
- Leibniz-Institute
for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Tim Frey
- Leibniz-Institute
for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
- TUM
School of Life Sciences, Technical University
of Munich, 85354 Freising, Germany
| | - Lena Ball
- Leibniz-Institute
for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
- TUM
School of Life Sciences, Technical University
of Munich, 85354 Freising, Germany
| | - Sandra Hoffmann
- Leibniz-Institute
for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Dietmar Krautwurst
- Leibniz-Institute
for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
17
|
Hellmuth M, Stadler PF. The Theory of Gene Family Histories. Methods Mol Biol 2024; 2802:1-32. [PMID: 38819554 DOI: 10.1007/978-1-0716-3838-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.
Collapse
Affiliation(s)
- Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Stockholm, Sweden
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Leipzig University, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad Nacional de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
18
|
Jin L, Wang D, Zhang J, Liu P, Wang Y, Lin Y, Liu C, Han Z, Long K, Li D, Jiang Y, Li G, Zhang Y, Bai J, Li X, Li J, Lu L, Kong F, Wang X, Li H, Huang Z, Ma J, Fan X, Shen L, Zhu L, Jiang Y, Tang G, Feng B, Zeng B, Ge L, Li X, Tang Q, Zhang Z, Li M. Dynamic chromatin architecture of the porcine adipose tissues with weight gain and loss. Nat Commun 2023; 14:3457. [PMID: 37308492 PMCID: PMC10258790 DOI: 10.1038/s41467-023-39191-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 06/02/2023] [Indexed: 06/14/2023] Open
Abstract
Using an adult female miniature pig model with diet-induced weight gain/weight loss, we investigated the regulatory mechanisms of three-dimensional (3D) genome architecture in adipose tissues (ATs) associated with obesity. We generated 249 high-resolution in situ Hi-C chromatin contact maps of subcutaneous AT and three visceral ATs, analyzing transcriptomic and chromatin architectural changes under different nutritional treatments. We find that chromatin architecture remodeling underpins transcriptomic divergence in ATs, potentially linked to metabolic risks in obesity development. Analysis of chromatin architecture among subcutaneous ATs of different mammals suggests the presence of transcriptional regulatory divergence that could explain phenotypic, physiological, and functional differences in ATs. Regulatory element conservation analysis in pigs and humans reveals similarities in the regulatory circuitry of genes responsible for the obesity phenotype and identified non-conserved elements in species-specific gene sets that underpin AT specialization. This work provides a data-rich tool for discovering obesity-related regulatory elements in humans and pigs.
Collapse
Grants
- National Natural Science Foundation of China (National Science Foundation of China)
- the National Key R & D Program of China (2020YFA0509500), the Sichuan Science and Technology Program (2021YFYZ0009 and 2021YFYZ0030)
- the National Key R & D Program of China (2021YFA0805903), the Tackling Project for Agricultural Key Core Technologies of China (NK2022110602), the Sichuan Science and Technology Program (2021ZDZX0008, 2022NZZJ0028 and 2022JDJQ0054), the Ya’an Science and Technology Program (21SXHZ0022)
- the Sichuan Science and Technology Program (2022NSFSC0056)
- the Sichuan Science and Technology Program (2022NSFSC1618)
- the National Key R & D Program of China (2021YFD1300800), the Sichuan Science and Technology Program (2021YFS0008 and 2022YFQ0022)
- the Opening Foundation of Key Laboratory of Pig Industry Sciences (22519C)
- the Sichuan Science and Technology Program (2021YFH0033), the Major Science and Technology Projects of Tibet Autonomous Region (XZ202101ZD0005N)
- the China Agriculture Research System (CARS-35-01A)
- the National Key R & D Program of China (2022YFF1000100), the Sichuan Science and Technology Program (2021ZDZX0008, 2022NZZJ0028 and 2022JDJQ0054)
- the Strategic Priority Research Program of CAS (XDA24020307), the Special Investigation on Science and Technology Basic Resources of the MOST of China (2019FY100102), the Beijing Natural Science Foundation (Z200021)
Collapse
Affiliation(s)
- Long Jin
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Danyang Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, 100101, Beijing, China
- School of Life Science, University of Chinese Academy of Sciences, 100049, Beijing, China
- Sars-Fang Centre and MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, 266100, China
| | - Jiaman Zhang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Pengliang Liu
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yujie Wang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yu Lin
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Can Liu
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Ziyin Han
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Molecular Design and Precise Breeding Key Laboratory of Guangdong Province, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
| | - Keren Long
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Diyan Li
- School of Pharmacy, Chengdu University, Chengdu, 610106, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Guisen Li
- Institute of Nephrology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, China
| | - Yu Zhang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jingyi Bai
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xiaokai Li
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jing Li
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Lu Lu
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Fanli Kong
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xun Wang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Hua Li
- Animal Molecular Design and Precise Breeding Key Laboratory of Guangdong Province, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
| | - Zhiqing Huang
- Institute of Animal Nutrition, Sichuan Agricultural University, Chengdu, 611130, China
| | - Jideng Ma
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xiaolan Fan
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Linyuan Shen
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Li Zhu
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yanzhi Jiang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Guoqing Tang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Bin Feng
- Institute of Animal Nutrition, Sichuan Agricultural University, Chengdu, 611130, China
| | - Bo Zeng
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Ya'an Digital Economy Operation Company, Ya'an, 625014, China
| | - Liangpeng Ge
- Pig Industry Sciences Key Laboratory of Ministry of Agriculture and Rural Affairs, Chongqing Academy of Animal Sciences, Chongqing, 402460, China
| | - Xuewei Li
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Qianzi Tang
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, 100101, Beijing, China.
- School of Life Science, University of Chinese Academy of Sciences, 100049, Beijing, China.
| | - Mingzhou Li
- Livestock and Poultry Multi-omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China.
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
19
|
Nevers Y, Glover NM, Dessimoz C, Lecompte O. Protein length distribution is remarkably uniform across the tree of life. Genome Biol 2023; 24:135. [PMID: 37291671 PMCID: PMC10251718 DOI: 10.1186/s13059-023-02973-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/16/2023] [Indexed: 06/10/2023] Open
Abstract
BACKGROUND In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| | - Natasha M Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
- Department of Computer Science, University College London, London, UK
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Odile Lecompte
- Department of Computer Science, Centre de Recherche en Biomédecine de Strasbourg, ICube, UMR 7357, University of Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
20
|
Piya AA, DeGiorgio M, Assis R. Predicting gene expression divergence between single-copy orthologs in two species. Genome Biol Evol 2023; 15:evad078. [PMID: 37170892 PMCID: PMC10220509 DOI: 10.1093/gbe/evad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 04/21/2023] [Accepted: 05/02/2023] [Indexed: 05/13/2023] Open
Abstract
Predicting gene expression divergence is integral to understanding the emergence of new biological functions and associated traits. Whereas several sophisticated methods have been developed for this task, their applications are either limited to duplicate genes or require expression data from more than two species. Thus, here we present PiXi, the first machine learning framework for predicting gene expression divergence between single-copy orthologs in two species. PiXi models gene expression evolution as an Ornstein-Uhlenbeck process, and overlays this model with multi-layer neural network, random forest, and support vector machine architectures for making predictions. It outputs the predicted class "conserved" or "diverged" for each pair of orthologs, as well as their predicted expression optima in the two species. We show that PiXi has high power and accuracy in predicting gene expression divergence between single-copy orthologs, as well as high accuracy and precision in estimating their expression optima in the two species, across a wide range of evolutionary scenarios, with the globally best performance achieved by a multi-layer neural network. Moreover, application of our best performing PiXi predictor to empirical gene expression data from single-copy orthologs residing at different loci in two species of Drosophila reveals that approximately 23% underwent expression divergence after positional relocation. Further analysis shows that several of these "diverged" genes are involved in the electron transport chain of the mitochondrial membrane, suggesting that new chromatin environments may impact energy production in Drosophila. Thus, by providing a toolkit for predicting gene expression divergence between single-copy orthologs in two species, PiXi can shed light on the origins of novel phenotypes across diverse biological processes and study systems.
Collapse
Affiliation(s)
- Antara Anika Piya
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FloridaUSA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FloridaUSA
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FloridaUSA
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FloridaUSA
| |
Collapse
|
21
|
Titus-McQuillan JE, Nanni AV, McIntyre LM, Rogers RL. Estimating transcriptome complexities across eukaryotes. BMC Genomics 2023; 24:254. [PMID: 37170194 PMCID: PMC10173493 DOI: 10.1186/s12864-023-09326-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/20/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Genomic complexity is a growing field of evolution, with case studies for comparative evolutionary analyses in model and emerging non-model systems. Understanding complexity and the functional components of the genome is an untapped wealth of knowledge ripe for exploration. With the "remarkable lack of correspondence" between genome size and complexity, there needs to be a way to quantify complexity across organisms. In this study, we use a set of complexity metrics that allow for evaluating changes in complexity using TranD. RESULTS We ascertain if complexity is increasing or decreasing across transcriptomes and at what structural level, as complexity varies. In this study, we define three metrics - TpG, EpT, and EpG- to quantify the transcriptome's complexity that encapsulates the dynamics of alternative splicing. Here we compare complexity metrics across 1) whole genome annotations, 2) a filtered subset of orthologs, and 3) novel genes to elucidate the impacts of orthologs and novel genes in transcript model analysis. Effective Exon Number (EEN) issued to compare the distribution of exon sizes within transcripts against random expectations of uniform exon placement. EEN accounts for differences in exon size, which is important because novel gene differences in complexity for orthologs and whole-transcriptome analyses are biased towards low-complexity genes with few exons and few alternative transcripts. CONCLUSIONS With our metric analyses, we are able to quantify changes in complexity across diverse lineages with greater precision and accuracy than previous cross-species comparisons under ortholog conditioning. These analyses represent a step toward whole-transcriptome analysis in the emerging field of non-model evolutionary genomics, with key insights for evolutionary inference of complexity changes on deep timescales across the tree of life. We suggest a means to quantify biases generated in ortholog calling and correct complexity analysis for lineage-specific effects. With these metrics, we directly assay the quantitative properties of newly formed lineage-specific genes as they lower complexity.
Collapse
Affiliation(s)
- James E Titus-McQuillan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| | - Adalena V Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Rebekah L Rogers
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| |
Collapse
|
22
|
Yi W, Luan A, Liu C, Wu J, Zhang W, Zhong Z, Wang Z, Yang M, Chen C, He Y. Genome-wide identification, phylogeny, and expression analysis of GRF transcription factors in pineapple ( Ananas comosus). FRONTIERS IN PLANT SCIENCE 2023; 14:1159223. [PMID: 37123828 PMCID: PMC10140365 DOI: 10.3389/fpls.2023.1159223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 03/17/2023] [Indexed: 05/03/2023]
Abstract
Background Pineapple is the only commercially grown fruit crop in the Bromeliaceae family and has significant agricultural, industrial, economic, and ornamental value. GRF (growth-regulating factor) proteins are important transcription factors that have evolved in seed plants (embryophytes). They contain two conserved domains, QLQ (Gln, Leu, Gln) and WRC (Trp, Arg, Cys), and regulate multiple aspects of plant growth and stress response, including floral organ development, leaf growth, and hormone responses. The GRF family has been characterized in a number of plant species, but little is known about this family in pineapple and other bromeliads. Main discoveries We identified eight GRF transcription factor genes in pineapple, and phylogenetic analysis placed them into five subfamilies (I, III, IV, V, VI). Segmental duplication appeared to be the major contributor to expansion of the AcGRF family, and the family has undergone strong purifying selection during evolution. Relative to that of other gene families, the gene structure of the GRF family showed less conservation. Analysis of promoter cis-elements suggested that AcGRF genes are widely involved in plant growth and development. Transcriptome data and qRT-PCR results showed that, with the exception of AcGRF5, the AcGRFs were preferentially expressed in the early stage of floral organ development and AcGRF2 was strongly expressed in ovules. Gibberellin treatment significantly induced AcGRF7/8 expression, suggesting that these two genes may be involved in the molecular regulatory pathway by which gibberellin promotes pineapple fruit expansion. Conclusion AcGRF proteins appear to play a role in the regulation of floral organ development and the response to gibberellin. The information reported here provides a foundation for further study of the functions of AcGRF genes and the traits they regulate.
Collapse
Affiliation(s)
- Wen Yi
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Aiping Luan
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China
| | - Chaoyang Liu
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Jing Wu
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Wei Zhang
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Ziqin Zhong
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Zhengpeng Wang
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Mingzhe Yang
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Chengjie Chen
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| | - Yehua He
- Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in South China, Ministry of Agriculture and Rural Areas, College of Horticulture, South China Agricultural University, Guangzhou, China
| |
Collapse
|
23
|
Laslo M, Just J, Angelini DR. Theme and variation in the evolution of insect sex determination. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:162-181. [PMID: 35239250 PMCID: PMC10078687 DOI: 10.1002/jez.b.23125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 11/24/2021] [Accepted: 01/03/2022] [Indexed: 11/07/2022]
Abstract
The development of dimorphic adult sexes is a critical process for most animals, one that is subject to intense selection. Work in vertebrate and insect model species has revealed that sex determination mechanisms vary widely among animal groups. However, this variation is not uniform, with a limited number of conserved factors. Therefore, sex determination offers an excellent context to consider themes and variations in gene network evolution. Here we review the literature describing sex determination in diverse insects. We have screened public genomic sequence databases for orthologs and duplicates of 25 genes involved in insect sex determination, identifying patterns of presence and absence. These genes and a 3.5 reference set of 43 others were used to infer phylogenies and compared to accepted organismal relationships to examine patterns of congruence and divergence. The function of candidate genes for roles in sex determination (virilizer, female-lethal-2-d, transformer-2) and sex chromosome dosage compensation (male specific lethal-1, msl-2, msl-3) were tested using RNA interference in the milkweed bug, Oncopeltus fasciatus. None of these candidate genes exhibited conserved roles in these processes. Amidst this variation we wish to highlight the following themes for the evolution of sex determination: (1) Unique features within taxa influence network evolution. (2) Their position in the network influences a component's evolution. Our analyses also suggest an inverse association of protein sequence conservation with functional conservation.
Collapse
Affiliation(s)
- Mara Laslo
- Department of Cell Biology, Curriculum Fellows ProgramHarvard Medical School25 Shattuck StBostonMassachusettsUSA
| | - Josefine Just
- Department of Organismic and Evolutionary BiologyHarvard University26 Oxford StCambridgeMassachusettsUSA
- Department of BiologyColby College5734 Mayflower Hill DrWatervilleMaineUSA
| | - David R. Angelini
- Department of BiologyColby College5734 Mayflower Hill DrWatervilleMaineUSA
| |
Collapse
|
24
|
Rosenski J, Shifman S, Kaplan T. Predicting gene knockout effects from expression data. BMC Med Genomics 2023; 16:26. [PMID: 36803845 PMCID: PMC9938619 DOI: 10.1186/s12920-023-01446-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 01/27/2023] [Indexed: 02/19/2023] Open
Abstract
BACKGROUND The study of gene essentiality, which measures the importance of a gene for cell division and survival, is used for the identification of cancer drug targets and understanding of tissue-specific manifestation of genetic conditions. In this work, we analyze essentiality and gene expression data from over 900 cancer lines from the DepMap project to create predictive models of gene essentiality. METHODS We developed machine learning algorithms to identify those genes whose essentiality levels are explained by the expression of a small set of "modifier genes". To identify these gene sets, we developed an ensemble of statistical tests capturing linear and non-linear dependencies. We trained several regression models predicting the essentiality of each target gene, and used an automated model selection procedure to identify the optimal model and hyperparameters. Overall, we examined linear models, gradient boosted trees, Gaussian process regression models, and deep learning networks. RESULTS We identified nearly 3000 genes for which we accurately predict essentiality using gene expression data of a small set of modifier genes. We show that both in the number of genes we successfully make predictions for, as well as in the prediction accuracy, our model outperforms current state-of-the-art works. CONCLUSIONS Our modeling framework avoids overfitting by identifying the small set of modifier genes, which are of clinical and genetic importance, and ignores the expression of noisy and irrelevant genes. Doing so improves the accuracy of essentiality prediction in various conditions and provides interpretable models. Overall, we present an accurate computational approach, as well as interpretable modeling of essentiality in a wide range of cellular conditions, thus contributing to a better understanding of the molecular mechanisms that govern tissue-specific effects of genetic disease and cancer.
Collapse
Affiliation(s)
- Jonathan Rosenski
- grid.9619.70000 0004 1937 0538School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Sagiv Shifman
- grid.9619.70000 0004 1937 0538Department of Genetics, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel. .,Department of Developmental Biology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
25
|
Escorcia-Rodríguez JM, Esposito M, Freyre-González JA, Moreno-Hagelsieb G. Non-synonymous to synonymous substitutions suggest that orthologs tend to keep their functions, while paralogs are a source of functional novelty. PeerJ 2022; 10:e13843. [PMID: 36065404 PMCID: PMC9440661 DOI: 10.7717/peerj.13843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/14/2022] [Indexed: 01/18/2023] Open
Abstract
Orthologs separate after lineages split from each other and paralogs after gene duplications. Thus, orthologs are expected to remain more functionally coherent across lineages, while paralogs have been proposed as a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS), as proxy for functional divergence. We used five working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering. The results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, suggesting that orthologs generally tend to be more functionally stable than paralogs. The differences in dN/dS ratios remained suggesting the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes with high codon usage biases, low coverage of either of the aligned sequences, or sequences with very high similarities. Separation by percent identity of the encoded proteins showed that the differences between the dN/dS ratios of orthologs and paralogs were more evident at high sequence identity, less so as identity dropped. The last results suggest that the differences between dN/dS ratios were partially related to differences in protein identity. However, they also suggested that paralogs undergo functional divergence relatively early after duplication. Our analyses indicate that choosing orthologs as probably functionally coherent remains the right approach in comparative genomics.
Collapse
Affiliation(s)
- Juan M. Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México
| | - Mario Esposito
- Department of Biology, Wilfrid Laurier University, Waterloo, Canada
| | - Julio A. Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México
| | | |
Collapse
|
26
|
Garrido-Gala J, Higuera JJ, Rodríguez-Franco A, Muñoz-Blanco J, Amil-Ruiz F, Caballero JL. A Comprehensive Study of the WRKY Transcription Factor Family in Strawberry. PLANTS 2022; 11:plants11121585. [PMID: 35736736 PMCID: PMC9229891 DOI: 10.3390/plants11121585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/10/2022] [Accepted: 06/11/2022] [Indexed: 11/16/2022]
Abstract
WRKY transcription factors play critical roles in plant growth and development or stress responses. Using up-to-date genomic data, a total of 64 and 257 WRKY genes have been identified in the diploid woodland strawberry, Fragaria vesca, and the more complex allo-octoploid commercial strawberry, Fragaria × ananassa cv. Camarosa, respectively. The completeness of the new genomes and annotations has enabled us to perform a more detailed evolutionary and functional study of the strawberry WRKY family members, particularly in the case of the cultivated hybrid, in which homoeologous and paralogous FaWRKY genes have been characterized. Analysis of the available expression profiles has revealed that many strawberry WRKY genes show preferential or tissue-specific expression. Furthermore, significant differential expression of several FaWRKY genes has been clearly detected in fruit receptacles and achenes during the ripening process and pathogen challenged, supporting a precise functional role of these strawberry genes in such processes. Further, an extensive analysis of predicted development, stress and hormone-responsive cis-acting elements in the strawberry WRKY family is shown. Our results provide a deeper and more comprehensive knowledge of the WRKY gene family in strawberry.
Collapse
Affiliation(s)
| | - José-Javier Higuera
- Departamento de Bioquímica y Biología Molecular, Campus Universitario de Rabanales y Campus de Excelencia Internacional Agroalimentario ceiA3, Edificio Severo Ochoa-C6, Universidad de Córdoba, 14071 Córdoba, Spain; (J.-J.H.); (A.R.-F.); (J.M.-B.)
| | - Antonio Rodríguez-Franco
- Departamento de Bioquímica y Biología Molecular, Campus Universitario de Rabanales y Campus de Excelencia Internacional Agroalimentario ceiA3, Edificio Severo Ochoa-C6, Universidad de Córdoba, 14071 Córdoba, Spain; (J.-J.H.); (A.R.-F.); (J.M.-B.)
| | - Juan Muñoz-Blanco
- Departamento de Bioquímica y Biología Molecular, Campus Universitario de Rabanales y Campus de Excelencia Internacional Agroalimentario ceiA3, Edificio Severo Ochoa-C6, Universidad de Córdoba, 14071 Córdoba, Spain; (J.-J.H.); (A.R.-F.); (J.M.-B.)
| | - Francisco Amil-Ruiz
- Unidad de Bioinformática, Servicio Central de Apoyo a la Investigación (SCAI), Universidad de Córdoba, 14071 Córdoba, Spain;
| | - José L. Caballero
- Departamento de Bioquímica y Biología Molecular, Campus Universitario de Rabanales y Campus de Excelencia Internacional Agroalimentario ceiA3, Edificio Severo Ochoa-C6, Universidad de Córdoba, 14071 Córdoba, Spain; (J.-J.H.); (A.R.-F.); (J.M.-B.)
- Correspondence:
| |
Collapse
|
27
|
Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022; 4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open
Abstract
Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching, Germany & TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
28
|
Crow M, Suresh H, Lee J, Gillis J. Coexpression reveals conserved gene programs that co-vary with cell type across kingdoms. Nucleic Acids Res 2022; 50:4302-4314. [PMID: 35451481 PMCID: PMC9071420 DOI: 10.1093/nar/gkac276] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 03/30/2022] [Accepted: 04/08/2022] [Indexed: 12/24/2022] Open
Abstract
What makes a mouse a mouse, and not a hamster? Differences in gene regulation between the two organisms play a critical role. Comparative analysis of gene coexpression networks provides a general framework for investigating the evolution of gene regulation across species. Here, we compare coexpression networks from 37 species and quantify the conservation of gene activity 1) as a function of evolutionary time, 2) across orthology prediction algorithms, and 3) with reference to cell- and tissue-specificity. We find that ancient genes are expressed in multiple cell types and have well conserved coexpression patterns, however they are expressed at different levels across cell types. Thus, differential regulation of ancient gene programs contributes to transcriptional cell identity. We propose that this differential regulation may play a role in cell diversification in both the animal and plant kingdoms.
Collapse
Affiliation(s)
- Megan Crow
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - Hamsini Suresh
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - John Lee
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| |
Collapse
|
29
|
Zhao C, Liu T, Wang Z. Functional Similarities of Protein-Coding Genes in Topologically Associating Domains and Spatially-Proximate Genomic Regions. Genes (Basel) 2022; 13:genes13030480. [PMID: 35328034 PMCID: PMC8951421 DOI: 10.3390/genes13030480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 02/26/2022] [Accepted: 03/05/2022] [Indexed: 02/01/2023] Open
Abstract
Topologically associating domains (TADs) are the structural and functional units of the genome. However, the functions of protein-coding genes existing in the same or different TADs have not been fully investigated. We compared the functional similarities of protein-coding genes existing in the same TAD and between different TADs, and also in the same gap region (the region between two consecutive TADs) and between different gap regions. We found that the protein-coding genes from the same TAD or gap region are more likely to share similar protein functions, and this trend is more obvious with TADs than the gap regions. We further created two types of gene–gene spatial interaction networks: the first type is based on Hi-C contacts, whereas the second type is based on both Hi-C contacts and the relationship of being in the same TAD. A graph auto-encoder was applied to learn the network topology, reconstruct the two types of networks, and predict the functions of the central genes/nodes based on the functions of the neighboring genes/nodes. It was found that better performance was achieved with the second type of network. Furthermore, we detected long-range spatially-interactive regions based on Hi-C contacts and calculated the functional similarities of the gene pairs from these regions.
Collapse
|
30
|
Sánchez AL, Lafond M. Colorful orthology clustering in bounded-degree similarity graphs. J Bioinform Comput Biol 2021; 19:2140010. [PMID: 34775924 DOI: 10.1142/s0219720021400102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known cluster editing formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.
Collapse
Affiliation(s)
- Alitzel López Sánchez
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| | - Manuel Lafond
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| |
Collapse
|
31
|
Begum T, Serrano‐Serrano ML, Robinson‐Rechavi M. Performance of a phylogenetic independent contrast method and an improved pairwise comparison under different scenarios of trait evolution after speciation and duplication. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| | - Martha Liliana Serrano‐Serrano
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| | - Marc Robinson‐Rechavi
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| |
Collapse
|
32
|
Begum T, Robinson-Rechavi M. Special Care Is Needed in Applying Phylogenetic Comparative Methods to Gene Trees with Speciation and Duplication Nodes. Mol Biol Evol 2021; 38:1614-1626. [PMID: 33169790 PMCID: PMC8042747 DOI: 10.1093/molbev/msaa288] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
How gene function evolves is a central question of evolutionary biology. It can be investigated by comparing functional genomics results between species and between genes. Most comparative studies of functional genomics have used pairwise comparisons. Yet it has been shown that this can provide biased results, as genes, like species, are phylogenetically related. Phylogenetic comparative methods should be used to correct for this, but they depend on strong assumptions, including unbiased tree estimates relative to the hypothesis being tested. Such methods have recently been used to test the “ortholog conjecture,” the hypothesis that functional evolution is faster in paralogs than in orthologs. Although pairwise comparisons of tissue specificity (τ) provided support for the ortholog conjecture, phylogenetic independent contrasts did not. Our reanalysis on the same gene trees identified problems with the time calibration of duplication nodes. We find that the gene trees used suffer from important biases, due to the inclusion of trees with no duplication nodes, to the relative age of speciations and duplications, to systematic differences in branch lengths, and to non-Brownian motion of tissue specificity on many trees. We find that incorrect implementation of phylogenetic method in empirical gene trees with duplications can be problematic. Controlling for biases allows successful use of phylogenetic methods to study the evolution of gene function and provides some support for the ortholog conjecture using three different phylogenetic approaches.
Collapse
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
33
|
Tarashansky AJ, Musser JM, Khariton M, Li P, Arendt D, Quake SR, Wang B. Mapping single-cell atlases throughout Metazoa unravels cell type evolution. eLife 2021; 10:e66747. [PMID: 33944782 PMCID: PMC8139856 DOI: 10.7554/elife.66747] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/30/2021] [Indexed: 12/11/2022] Open
Abstract
Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning sponge to mouse, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.
Collapse
Affiliation(s)
| | - Jacob M Musser
- European Molecular Biology Laboratory, Developmental Biology UnitHeidelbergGermany
| | | | - Pengyang Li
- Department of Bioengineering, Stanford UniversityStanfordUnited States
| | - Detlev Arendt
- European Molecular Biology Laboratory, Developmental Biology UnitHeidelbergGermany
- Centre for Organismal Studies, University of HeidelbergHeidelbergGermany
| | - Stephen R Quake
- Department of Bioengineering, Stanford UniversityStanfordUnited States
- Department of Applied Physics, Stanford UniversityStanfordUnited States
- Chan Zuckerberg BiohubSan FranciscoUnited States
| | - Bo Wang
- Department of Bioengineering, Stanford UniversityStanfordUnited States
- Department of Developmental Biology, Stanford University School of MedicineStanfordUnited States
| |
Collapse
|
34
|
Berkemer SJ, McGlynn SE. A New Analysis of Archaea-Bacteria Domain Separation: Variable Phylogenetic Distance and the Tempo of Early Evolution. Mol Biol Evol 2021; 37:2332-2340. [PMID: 32316034 PMCID: PMC7403611 DOI: 10.1093/molbev/msaa089] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Comparative genomics and molecular phylogenetics are foundational for understanding biological evolution. Although many studies have been made with the aim of understanding the genomic contents of early life, uncertainty remains. A study by Weiss et al. (Weiss MC, Sousa FL, Mrnjavac N, Neukirchen S, Roettger M, Nelson-Sathi S, Martin WF. 2016. The physiology and habitat of the last universal common ancestor. Nat Microbiol. 1(9):16116.) identified a number of protein families in the last universal common ancestor of archaea and bacteria (LUCA) which were not found in previous works. Here, we report new research that suggests the clustering approaches used in this previous study undersampled protein families, resulting in incomplete phylogenetic trees which do not reflect protein family evolution. Phylogenetic analysis of protein families which include more sequence homologs rejects a simple LUCA hypothesis based on phylogenetic separation of the bacterial and archaeal domains for a majority of the previously identified LUCA proteins (∼82%). To supplement limitations of phylogenetic inference derived from incompletely populated orthologous groups and to test the hypothesis of a period of rapid evolution preceding the separation of the domains, we compared phylogenetic distances both within and between domains, for thousands of orthologous groups. We find a substantial diversity of interdomain versus intradomain branch lengths, even among protein families which exhibit a single domain separating branch and are thought to be associated with the LUCA. Additionally, phylogenetic trees with long interdomain branches relative to intradomain branches are enriched in information categories of protein families in comparison to those associated with metabolic functions. These results provide a new view of protein family evolution and temper claims about the phenotype and habitat of the LUCA.
Collapse
Affiliation(s)
- Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Competence Center for Scalable Data Services and Solutions, Dresden/Leipzig, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA.,RIKEN Center for Sustainable Resource Science (CSRS), Saitama, Japan
| |
Collapse
|
35
|
Schaller D, Geiß M, Stadler PF, Hellmuth M. Complete Characterization of Incorrect Orthology Assignments in Best Match Graphs. J Math Biol 2021; 82:20. [PMID: 33606106 PMCID: PMC7894253 DOI: 10.1007/s00285-021-01564-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 09/23/2020] [Accepted: 12/21/2020] [Indexed: 02/06/2023]
Abstract
Genome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false positives in the reciprocal best match graph. We consider duplication/loss scenarios and characterize unambiguous false-positive (u-fp) orthology assignments, that is, edges in the best match graphs (BMGs) that cannot correspond to orthologs for any gene tree that explains the BMG. Moreover, we provide a polynomial-time algorithm to identify all u-fp orthology assignments in a BMG. Simulations show that at least [Formula: see text] of all incorrect orthology assignments can be detected in this manner. All results rely only on the structure of the BMGs and not on any a priori knowledge about underlying gene or species trees.
Collapse
Affiliation(s)
- David Schaller
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center of Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Manuela Geiß
- Software Competence Center Hagenberg GmbH, Softwarepark 21, A-4232 Hagenberg, Austria
| | - Peter F. Stadler
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Inst. f. Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501 USA
| | - Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, SE 106 91 Stockholm, Sweden
| |
Collapse
|
36
|
Rosselli R, La Porta N, Muresu R, Stevanato P, Concheri G, Squartini A. Pangenomics of the Symbiotic Rhizobiales. Core and Accessory Functions Across a Group Endowed with High Levels of Genomic Plasticity. Microorganisms 2021; 9:microorganisms9020407. [PMID: 33669391 PMCID: PMC7920277 DOI: 10.3390/microorganisms9020407] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/10/2021] [Accepted: 02/11/2021] [Indexed: 11/16/2022] Open
Abstract
Pangenome analyses reveal major clues on evolutionary instances and critical genome core conservation. The order Rhizobiales encompasses several families with rather disparate ecological attitudes. Among them, Rhizobiaceae, Bradyrhizobiaceae, Phyllobacteriacreae and Xanthobacteriaceae, include members proficient in mutualistic symbioses with plants based on the bacterial conversion of N2 into ammonia (nitrogen-fixation). The pangenome of 12 nitrogen-fixing plant symbionts of the Rhizobiales was analyzed yielding total 37,364 loci, with a core genome constituting 700 genes. The percentage of core genes averaged 10.2% over single genomes, and between 5% to 7% were found to be plasmid-associated. The comparison between a representative reference genome and the core genome subset, showed the core genome highly enriched in genes for macromolecule metabolism, ribosomal constituents and overall translation machinery, while membrane/periplasm-associated genes, and transport domains resulted under-represented. The analysis of protein functions revealed that between 1.7% and 4.9% of core proteins could putatively have different functions.
Collapse
Affiliation(s)
- Riccardo Rosselli
- Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute of Sea Research, NL-1790 AB Den Burg, The Netherlands;
- Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, 03690 Alicante, Spain
| | - Nicola La Porta
- Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, 38098 San Michele all’Adige, Italy;
- MOUNTFOR Project Centre, European Forest Institute, 38098 San Michele all’Adige, Italy
| | - Rosella Muresu
- Institute of Animal Production Systems in Mediterranean Environments-National Research Council, 07040 Sassari, Italy;
| | - Piergiorgio Stevanato
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro, Italy; (P.S.); (G.C.)
| | - Giuseppe Concheri
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro, Italy; (P.S.); (G.C.)
| | - Andrea Squartini
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro, Italy; (P.S.); (G.C.)
- Correspondence: ; Tel.: +39-049-8272-923
| |
Collapse
|
37
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|
38
|
Salmanian S, Pezeshk H, Sadeghi M. Inter-protein residue covariation information unravels physically interacting protein dimers. BMC Bioinformatics 2020; 21:584. [PMID: 33334319 PMCID: PMC7745481 DOI: 10.1186/s12859-020-03930-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.
Collapse
Affiliation(s)
- Sara Salmanian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
- Present Address: Department of Mathematics and Statistics, Concordia University, Montreal, Canada
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
39
|
Agarwal PR, Lahiri A. Comparative study of the SBP-box gene family in rice siblings. J Biosci 2020. [DOI: 10.1007/s12038-020-00048-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
40
|
Zhang P, Berardini TZ, Ebert D, Li Q, Mi H, Muruganujan A, Prithvi T, Reiser L, Sawant S, Thomas PD, Huala E. PhyloGenes: An online phylogenetics and functional genomics resource for plant gene function inference. PLANT DIRECT 2020; 4:e00293. [PMID: 33392435 PMCID: PMC7773024 DOI: 10.1002/pld3.293] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 11/11/2020] [Indexed: 05/22/2023]
Abstract
We aim to enable the accurate and efficient transfer of knowledge about gene function gained from Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non-plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.
Collapse
Affiliation(s)
| | | | - Dustin Ebert
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | - Qian Li
- Phoenix BioinformaticsFremontCAUSA
| | - Huaiyu Mi
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | - Anushya Muruganujan
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | | | | | | | - Paul D. Thomas
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | | |
Collapse
|
41
|
Ahrens JB, Teufel AI, Siltberg-Liberles J. A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs. J Mol Evol 2020; 88:720-730. [PMID: 33118098 DOI: 10.1007/s00239-020-09969-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 10/15/2020] [Indexed: 10/23/2022]
Abstract
Heterotachy-the change in sequence evolutionary rate over time-is a common feature of protein molecular evolution. Decades of studies have shed light on the conditions under which heterotachy occurs, and there is evidence that site-specific evolutionary rate shifts are correlated with changes in protein function. Here, we present a large-scale, computational analysis using thousands of protein sequence alignments from animal and plant proteomes, representing genes related either by orthology (speciation events) or paralogy (gene duplication), to compare sequence divergence patterns in orthologous vs. paralogous sequence alignments. We use sequence-based phylogenetic analyses to infer overall sequence divergence (tree length/number of sequences) and to fit site-specific rates to a discrete gamma distribution with a shape parameter α. This inference method is applied to real protein sequence alignments, as well as alignments simulated under various models of protein sequence evolution. Our simulations indicate that sequence divergence and the α parameter are positively correlated when sequences evolve with heterotachy, meaning that inferred site rate distributions appear more uniform as sequences diverge. Divergence and α are also positively correlated in both orthologous and paralogous genes, but the average increase in α (as a function of divergence) is significantly higher in paralogous protein alignments than in orthologous alignments. This result is consistent with the widely held view that recently duplicated proteins initially evolve under relaxed selective pressure, promoting functional divergence by accumulation of amino acid replacements, and hence experience more evolutionary rate fluctuations than orthologous proteins. We discuss these findings in the context of the ortholog conjecture, a long-standing assumption in molecular evolution, which posits that protein sequences related by orthology tend to be more functionally conserved than paralogous proteins.
Collapse
Affiliation(s)
- Joseph B Ahrens
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA. .,Department of Biochemistry and Molecular Genetics, Computational Bioscience Program, University of Colorado Denver, Aurora, CO, USA.
| | - Ashley I Teufel
- Department of Integrative Biology, The University of Texas At Austin, Austin, TX, USA.,Santa Fe Institute, Santa Fe, NM, USA
| | - Jessica Siltberg-Liberles
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA.
| |
Collapse
|
42
|
Baldwin MW, Ko MC. Functional evolution of vertebrate sensory receptors. Horm Behav 2020; 124:104771. [PMID: 32437717 DOI: 10.1016/j.yhbeh.2020.104771] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 04/20/2020] [Accepted: 04/28/2020] [Indexed: 12/15/2022]
Abstract
Sensory receptors enable animals to perceive their external world, and functional properties of receptors evolve to detect the specific cues relevant for an organism's survival. Changes in sensory receptor function or tuning can directly impact an organism's behavior. Functional tests of receptors from multiple species and the generation of chimeric receptors between orthologs with different properties allow for the dissection of the molecular basis of receptor function and identification of the key residues that impart functional changes in different species. Knowledge of these functionally important sites facilitates investigation into questions regarding the role of epistasis and the extent of convergence, as well as the timing of sensory shifts relative to other phenotypic changes. However, as receptors can also play roles in non-sensory tissues, and receptor responses can be modulated by numerous other factors including varying expression levels, alternative splicing, and morphological features of the sensory cell, behavioral validation can be instrumental in confirming that responses observed in heterologous systems play a sensory role. Expression profiling of sensory cells and comparative genomics approaches can shed light on cell-type specific modifications and identify other proteins that may affect receptor function and can provide insight into the correlated evolution of complex suites of traits. Here we review the evolutionary history and diversity of functional responses of the major classes of sensory receptors in vertebrates, including opsins, chemosensory receptors, and ion channels involved in temperature-sensing, mechanosensation and electroreception.
Collapse
Affiliation(s)
| | - Meng-Ching Ko
- Max Planck Institute for Ornithology, Seewiesen, Germany
| |
Collapse
|
43
|
Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020; 36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data cast doubt on the conclusions. RESULTS We use experimental annotations from over 40 000 proteins, drawn from over 80 000 publications, to revisit the ortholog conjecture in two pairs of species: (i) Homo sapiens and Mus musculus and (ii) Saccharomyces cerevisiae and Schizosaccharomyces pombe. By making a distinction between questions about the evolution of function versus questions about the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction, though questions about the evolution of function remain difficult to address. In both pairs of species, we quantify the amount of information that would be ignored if paralogs are discarded, as well as the resulting loss in prediction accuracy. Taken as a whole, our results support the view that the types of homologs used for function transfer are largely irrelevant to the task of function prediction. Maximizing the amount of data used for this task, regardless of whether it comes from orthologs or paralogs, is most likely to lead to higher prediction accuracy. AVAILABILITY AND IMPLEMENTATION https://github.com/predragradivojac/oc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moses Stamboulian
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Rafael F Guerrero
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
44
|
Nevers Y, Kress A, Defosset A, Ripp R, Linard B, Thompson JD, Poch O, Lecompte O. OrthoInspector 3.0: open portal for comparative genomics. Nucleic Acids Res 2020; 47:D411-D418. [PMID: 30380106 PMCID: PMC6323921 DOI: 10.1093/nar/gky1068] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 10/19/2018] [Indexed: 01/08/2023] Open
Abstract
OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Audrey Defosset
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Raymond Ripp
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Benjamin Linard
- LIRMM, Univ Montpellier, CNRS, Montpellier, France.,ISEM, Univ Montpellier, CNRS, IRD, EPHE, CIRAD, INRAP, Montpellier, France.,AGAP, Univ Montpellier, CIRAD, INRA, Montpellier Supagro, Montpellier, France
| | - Julie D Thompson
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| |
Collapse
|
45
|
Laurent JM, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM. Humanization of yeast genes with multiple human orthologs reveals functional divergence between paralogs. PLoS Biol 2020; 18:e3000627. [PMID: 32421706 PMCID: PMC7259792 DOI: 10.1371/journal.pbio.3000627] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 05/29/2020] [Accepted: 04/14/2020] [Indexed: 01/17/2023] Open
Abstract
Despite over a billion years of evolutionary divergence, several thousand human genes possess clearly identifiable orthologs in yeast, and many have undergone lineage-specific duplications in one or both lineages. These duplicated genes may have been free to diverge in function since their expansion, and it is unclear how or at what rate ancestral functions are retained or partitioned among co-orthologs between species and within gene families. Thus, in order to investigate how ancestral functions are retained or lost post-duplication, we systematically replaced hundreds of essential yeast genes with their human orthologs from gene families that have undergone lineage-specific duplications, including those with single duplications (1 yeast gene to 2 human genes, 1:2) or higher-order expansions (1:>2) in the human lineage. We observe a variable pattern of replaceability across different ortholog classes, with an obvious trend toward differential replaceability inside gene families, and rarely observe replaceability by all members of a family. We quantify the ability of various properties of the orthologs to predict replaceability, showing that in the case of 1:2 orthologs, replaceability is predicted largely by the divergence and tissue-specific expression of the human co-orthologs, i.e., the human proteins that are less diverged from their yeast counterpart and more ubiquitously expressed across human tissues more often replace their single yeast ortholog. These trends were consistent with in silico simulations demonstrating that when only one ortholog can replace its corresponding yeast equivalent, it tends to be the least diverged of the pair. Replaceability of yeast genes having more than 2 human co-orthologs was marked by retention of orthologous interactions in functional or protein networks as well as by more ancestral subcellular localization. Overall, we performed >400 human gene replaceability assays, revealing 50 new human-yeast complementation pairs, thus opening up avenues to further functionally characterize these human genes in a simplified organismal context.
Collapse
Affiliation(s)
- Jon M. Laurent
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Systems Genetics, NYU Langone Health, New York, New York, United States of America
| | - Riddhiman K. Garge
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| | - Ashley I. Teufel
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| | - Claus O. Wilke
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Aashiq H. Kachroo
- The Department of Biology, Centre for Applied Synthetic Biology, Concordia University, Montreal, Quebec, Canada
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
46
|
Kim S, Park J, Kim T, Lee JS. The functional study of human proteins using humanized yeast. J Microbiol 2020; 58:343-349. [PMID: 32342338 DOI: 10.1007/s12275-020-0136-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 04/13/2020] [Accepted: 04/13/2020] [Indexed: 12/18/2022]
Abstract
The functional and optimal expression of genes is crucial for survival of all living organisms. Numerous experiments and efforts have been performed to reveal the mechanisms required for the functional and optimal expression of human genes. The yeast Saccharomyces cerevisiae has evolved independently of humans for billions of years. Nevertheless, S. cerevisiae has many conserved genes and expression mechanisms that are similar to those in humans. Yeast is the most commonly used model organism for studying the function and expression mechanisms of human genes because it has a relatively simple genome structure, which is easy to manipulate. Many previous studies have focused on understanding the functions and mechanisms of human proteins using orthologous genes and biological systems of yeast. In this review, we mainly introduce two recent studies that replaced human genes and nucleosomes with those of yeast. Here, we suggest that, although yeast is a relatively small eukaryotic cell, its humanization is useful for the direct study of human proteins. In addition, yeast can be used as a model organism in a broader range of studies, including drug screening.
Collapse
Affiliation(s)
- Seho Kim
- Department of Molecular Bioscience, College of Biomedical Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Juhee Park
- Department of Molecular Bioscience, College of Biomedical Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Taekyung Kim
- Department of Biology Education, Pusan National University, Busan, 26241, Republic of Korea.
| | - Jung-Shin Lee
- Department of Molecular Bioscience, College of Biomedical Science, Kangwon National University, Chuncheon, 24341, Republic of Korea.
| |
Collapse
|
47
|
David KT, Oaks JR, Halanych KM. Patterns of gene evolution following duplications and speciations in vertebrates. PeerJ 2020; 8:e8813. [PMID: 32266119 PMCID: PMC7120047 DOI: 10.7717/peerj.8813] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 02/27/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Eukaryotic genes typically form independent evolutionary lineages through either speciation or gene duplication events. Generally, gene copies resulting from speciation events (orthologs) are expected to maintain similarity over time with regard to sequence, structure and function. After a duplication event, however, resulting gene copies (paralogs) may experience a broader set of possible fates, including partial (subfunctionalization) or complete loss of function, as well as gain of new function (neofunctionalization). This assumption, known as the Ortholog Conjecture, is prevalent throughout molecular biology and notably plays an important role in many functional annotation methods. Unfortunately, studies that explicitly compare evolutionary processes between speciation and duplication events are rare and conflicting. METHODS To provide an empirical assessment of ortholog/paralog evolution, we estimated ratios of nonsynonymous to synonymous substitutions (ω = dN/dS) for 251,044 lineages in 6,244 gene trees across 77 vertebrate taxa. RESULTS Overall, we found ω to be more similar between lineages descended from speciation events (p < 0.001) than lineages descended from duplication events, providing strong support for the Ortholog Conjecture. The asymmetry in ω following duplication events appears to be largely driven by an increase along one of the paralogous lineages, while the other remains similar to the parent. This trend is commonly associated with neofunctionalization, suggesting that gene duplication is a significant mechanism for generating novel gene functions.
Collapse
Affiliation(s)
- Kyle T. David
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | - Jamie R. Oaks
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | | |
Collapse
|
48
|
Bick JT, Zeng S, Robinson MD, Ulbrich SE, Bauersachs S. Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5539597. [PMID: 31353404 PMCID: PMC6661403 DOI: 10.1093/database/baz086] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 03/08/2019] [Accepted: 06/05/2019] [Indexed: 02/02/2023]
Abstract
Next-generation sequencing technologies and the availability of an increasing number of mammalian and other genomes allow gene expression studies, particularly RNA sequencing, in many non-model organisms. However, incomplete genome annotation and assignments of genes to functional annotation databases can lead to a substantial loss of information in downstream data analysis. To overcome this, we developed Mammalian Annotation Database tool (MAdb, https://madb.ethz.ch) to conveniently provide homologous gene information for selected mammalian species. The assignment between species is performed in three steps: (i) matching official gene symbols, (ii) using ortholog information contained in Ensembl Compara and (iii) pairwise BLAST comparisons of all transcripts. In addition, we developed a new tool (AnnOverlappeR) for the reliable assignment of the National Center for Biotechnology Information (NCBI) and Ensembl gene IDs. The gene lists translated to gene IDs of well-annotated species such as a human can be used for improved functional annotation with relevant tools based on Gene Ontology and molecular pathway information. We tested the MAdb on a published RNA-seq data set for the pig and showed clearly improved overrepresentation analysis results based on the assigned human homologous gene identifiers. Using the MAdb revealed a similar list of human homologous genes and functional annotation results regardless of whether starting with gene IDs from NCBI or Ensembl. The MAdb database is accessible via a web interface and a Galaxy application.
Collapse
Affiliation(s)
- Jochen T Bick
- Animal Physiology, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland
| | - Shuqin Zeng
- Animal Physiology, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland.,Genetics and Functional Genomics, Vetsuisse Faculty Zurich, University of Zurich, Zurich, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Susanne E Ulbrich
- Animal Physiology, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland
| | - Stefan Bauersachs
- Animal Physiology, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland.,Genetics and Functional Genomics, Vetsuisse Faculty Zurich, University of Zurich, Zurich, Switzerland
| |
Collapse
|
49
|
Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, da Silva AS, Wang Y, Sonnhammer E, Thomas PD. Advances and Applications in the Quest for Orthologs. Mol Biol Evol 2020; 36:2157-2164. [PMID: 31241141 PMCID: PMC6759064 DOI: 10.1093/molbev/msz150] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.
Collapse
Affiliation(s)
- Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (BIK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Sofia K Forslund
- Experimental and Clinical Research Center, A Cooperation of Charité-Universitätsmedizin Berlin and Max Delbruck Center for Molecular Medicine, Berlin, Germany.,Max Delbruck Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität u Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,ICREA, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Cécile Pereira
- Eura Nova, Marseille, France.,Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL
| | - Alan Sousa da Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yan Wang
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California-Riverside, Riverside, CA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
50
|
Evolution of vascular plants through redeployment of ancient developmental regulators. Proc Natl Acad Sci U S A 2019; 117:733-740. [PMID: 31874927 DOI: 10.1073/pnas.1912470117] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Vascular plants provide most of the biomass, food, and feed on earth, yet the molecular innovations that led to the evolution of their conductive tissues are unknown. Here, we reveal the evolutionary trajectory for the heterodimeric TMO5/LHW transcription factor complex, which is rate-limiting for vascular cell proliferation in Arabidopsis thaliana Both regulators have origins predating vascular tissue emergence, and even terrestrialization. We further show that TMO5 evolved its modern function, including dimerization with LHW, at the origin of land plants. A second innovation in LHW, coinciding with vascular plant emergence, conditioned obligate heterodimerization and generated the critical function in vascular development in Arabidopsis In summary, our results suggest that the division potential of vascular cells may have been an important factor contributing to the evolution of vascular plants.
Collapse
|