1
|
Majidian S, Nevers Y, Yazdizadeh Kharrazi A, Warwick Vesztrocy A, Pascarelli S, Moi D, Glover N, Altenhoff AM, Dessimoz C. Orthology inference at scale with FastOMA. Nat Methods 2025; 22:269-272. [PMID: 39753922 PMCID: PMC11810774 DOI: 10.1038/s41592-024-02552-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 10/29/2024] [Indexed: 02/12/2025]
Abstract
The surge in genome data, with ongoing efforts aiming to sequence 1.5 M eukaryotes in a decade, could revolutionize genomics, revealing the origins, evolution and genetic innovations of biological processes. Yet, traditional genomics methods scale poorly with such large datasets. Here, addressing this, 'FastOMA' provides linear scalability for orthology inference, enabling the processing of thousands of eukaryotic genomes within a day. FastOMA maintains the high accuracy and resolution of the well-established Orthologous Matrix (OMA) approach in benchmarks. FastOMA is available via GitHub at https://github.com/DessimozLab/FastOMA/ .
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Alex Warwick Vesztrocy
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Stefano Pascarelli
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Moi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Adrian M Altenhoff
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
2
|
Tegenfeldt F, Kuznetsov D, Manni M, Berkeley M, Zdobnov EM, Kriventseva EV. OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes. Nucleic Acids Res 2025; 53:D516-D522. [PMID: 39535043 PMCID: PMC11701741 DOI: 10.1093/nar/gkae987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/09/2024] [Accepted: 11/12/2024] [Indexed: 11/16/2024] Open
Abstract
OrthoDB (https://www.orthodb.org) offers evolutionary and functional annotations of orthologous genes in the widest sampling of eukaryotes, prokaryotes, and viruses, extending experimental gene function knowledge to newly sequenced genomes. We collect gene annotations, delineate hierarchical gene orthology and annotate the orthologous groups (OGs) with functional and evolutionary traits. OrthoDB is the leading resource for species diversity, striving to sample the most diverse and well-researched organisms with the highest quality genomic data. This update expands to include 5827 eukaryotic genomes. We have also added coding DNA sequences (CDSs) and gene loci coordinates. OrthoDB can be browsed, downloaded, or accessed using REST API, SPARQL/RDF and now also via API packages for Python and R Bioconductor. OrthoLoger (https://orthologer.ezlab.org), the tool used for inferring orthologs in OrthoDB, is now available as a Conda package and through BioContainers. ODB-mapper, a component of OrthoLoger, streamlines annotation of genes from newly sequenced genomes with OrthoDB evolutionary and functional descriptors. The benchmarking sets of universal single-copy orthologs (BUSCO), derived from OrthoDB, had correspondingly a major update. The BUSCO tool (https://busco.ezlab.org) has become a standard in genomics, uniquely capable of assessing both eukaryotic and prokaryotic species. It is applicable to gene sets, transcriptomes, genome assemblies and metagenomic bins.
Collapse
Affiliation(s)
- Fredrik Tegenfeldt
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Dmitry Kuznetsov
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Matthew Berkeley
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Evgenia V Kriventseva
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| |
Collapse
|
3
|
Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I. Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biol Evol 2024; 16:evae224. [PMID: 39404012 PMCID: PMC11523110 DOI: 10.1093/gbe/evae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 11/01/2024] Open
Abstract
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.
Collapse
Affiliation(s)
- Felix Langschied
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Salvatore Cosentino
- Department of Integrated Biosciences, The University of Tokyo, 277-0882 Tokyo, Japan
| | - Diego Fuentes-Palacios
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Natasha Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael Hiller
- Department of Comparative Genomics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, Boston, MA 02115, USA
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
| | - Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Wataru Iwasaki
- Department of Integrated Biosciences, University of Tokyo, 277-0882 Tokyo, Japan
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Saioa Manzano-Morales
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | | | - Toni Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ingo Ebersberger
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
| |
Collapse
|
4
|
Barrios-Núñez I, Martínez-Redondo G, Medina-Burgos P, Cases I, Fernández R, Rojas A. Decoding functional proteome information in model organisms using protein language models. NAR Genom Bioinform 2024; 6:lqae078. [PMID: 38962255 PMCID: PMC11217674 DOI: 10.1093/nargab/lqae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/31/2024] [Accepted: 06/26/2024] [Indexed: 07/05/2024] Open
Abstract
Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use.
Collapse
Affiliation(s)
- Israel Barrios-Núñez
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | | | - Patricia Medina-Burgos
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | - Ildefonso Cases
- Bioinformatics Unit, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | - Rosa Fernández
- Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain
| | - Ana M Rojas
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| |
Collapse
|
5
|
Ciach MA, Pawłowska J, Górecki P, Muszewska A. The interkingdom horizontal gene transfer in 44 early diverging fungi boosted their metabolic, adaptive, and immune capabilities. Evol Lett 2024; 8:526-538. [PMID: 39100235 PMCID: PMC11291939 DOI: 10.1093/evlett/qrae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 02/14/2024] [Accepted: 02/20/2024] [Indexed: 08/06/2024] Open
Abstract
Numerous studies have been devoted to individual cases of horizontally acquired genes in fungi. It has been shown that such genes expand the hosts' metabolic capabilities and contribute to their adaptations as parasites or symbionts. Some studies have provided an extensive characterization of the horizontal gene transfer (HGT) in Dikarya. However, in the early diverging fungi (EDF), a similar characterization is still missing. In order to fill this gap, we have designed a computational pipeline to obtain a statistical sample of reliable HGT events with a low false discovery rate. We have analyzed 44 EDF proteomes and identified 829 xenologs in fungi ranging from Chytridiomycota to Mucoromycota. We have identified several patterns and statistical properties of EDF HGT. We show that HGT is driven by bursts of gene exchange and duplication, resulting in highly divergent numbers and molecular properties of xenologs between fungal lineages. Ancestrally aquatic fungi are generally more likely to acquire foreign genetic material than terrestrial ones. Endosymbiotic bacteria can be a source of useful xenologs, as exemplified by NOD-like receptors transferred to Mortierellomycota. Closely related fungi have similar rates of intronization of xenologs. Posttransfer gene fusions and losses of protein domains are common and may influence the encoded proteins' functions. We argue that there is no universal approach for HGT identification and inter- and intra-kingdom transfers require tailored identification methods. Our results help to better understand how and to what extent HGT has shaped the metabolic, adaptive, and immune capabilities of fungi.
Collapse
Affiliation(s)
- Michał Aleksander Ciach
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Julia Pawłowska
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Anna Muszewska
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| |
Collapse
|
6
|
Rahiminejad S, De Sanctis B, Pevzner P, Mushegian A. Synthetic lethality and the minimal genome size problem. mSphere 2024; 9:e0013924. [PMID: 38904396 PMCID: PMC11288024 DOI: 10.1128/msphere.00139-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
Gene knockout studies suggest that ~300 genes in a bacterial genome and ~1,100 genes in a yeast genome cannot be deleted without loss of viability. These single-gene knockout experiments do not account for negative genetic interactions, when two or more genes can each be deleted without effect, but their joint deletion is lethal. Thus, large-scale single-gene deletion studies underestimate the size of a minimal gene set compatible with cell survival. In yeast Saccharomyces cerevisiae, the viability of all possible deletions of gene pairs (2-tuples), and of some deletions of gene triplets (3-tuples), has been experimentally tested. To estimate the size of a yeast minimal genome from that data, we first established that finding the size of a minimal gene set is equivalent to finding the minimum vertex cover in the lethality (hyper)graph, where the vertices are genes and (hyper)edges connect k-tuples of genes whose joint deletion is lethal. Using the Lovász-Johnson-Chvatal greedy approximation algorithm, we computed the minimum vertex cover of the synthetic-lethal 2-tuples graph to be 1,723 genes. We next simulated the genetic interactions in 3-tuples, extrapolating from the existing triplet sample, and again estimated minimum vertex covers. The size of a minimal gene set in yeast rapidly approaches the size of the entire genome even when considering only synthetic lethalities in k-tuples with small k. In contrast, several studies reported successful experimental reductions of yeast and bacterial genomes by simultaneous deletions of hundreds of genes, without eliciting synthetic lethality. We discuss possible reasons for this apparent contradiction.IMPORTANCEHow can we estimate the smallest number of genes sufficient for a unicellular organism to survive on a rich medium? One approach is to remove genes one at a time and count how many of such deletion strains are unable to grow. However, the single-gene knockout data are insufficient, because joint gene deletions may result in negative genetic interactions, also known as synthetic lethality. We used a technique from graph theory to estimate the size of minimal yeast genome from partial data on synthetic lethality. The number of potential synthetic lethal interactions grows very fast when multiple genes are deleted, revealing a paradoxical contrast with the experimental reductions of yeast genome by ~100 genes, and of bacterial genomes by several hundreds of genes.
Collapse
Affiliation(s)
- Sara Rahiminejad
- Department of Bioengineering, University of California—San Diego, La Jolla, California, USA
| | - Bianca De Sanctis
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology and Evolutionary Biology, University of California—Santa Cruz, Santa Cruz, California, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California—San Diego, La Jolla, California, USA
| | - Arcady Mushegian
- Molecular and Cellular Biosciences Division, National Science Foundation, Alexandria, Virginia, USA
- Clare Hall College, Cambridge, United Kingdom
| |
Collapse
|
7
|
Cosentino S, Sriswasdi S, Iwasaki W. SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models. Genome Biol 2024; 25:195. [PMID: 39054525 PMCID: PMC11270883 DOI: 10.1186/s13059-024-03298-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 06/04/2024] [Indexed: 07/27/2024] Open
Abstract
Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where a gradient boosting predictor halves the execution time and a language model doubles the recall. Application to empirical large-scale and standardized benchmark datasets shows that SonicParanoid2 is much faster than comparable methods and also the most accurate. SonicParanoid2 is available at https://gitlab.com/salvo981/sonicparanoid2 and https://zenodo.org/doi/10.5281/zenodo.11371108 .
Collapse
Affiliation(s)
- Salvatore Cosentino
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan
| | - Sira Sriswasdi
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan.
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Bunkyo-ku, Japan.
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan.
- Atmosphere and Ocean Research Institute, the University of Tokyo, Kashiwa, Japan.
- Institute for Quantitative Biosciences, the University of Tokyo, Bunkyo-ku, Japan.
- Collaborative Research Institute for Innovative Microbiology, the University of Tokyo, Bunkyo-ku, Japan.
| |
Collapse
|
8
|
Fenster JA, Azzinaro PA, Dinhobl M, Borca MV, Spinard E, Gladue DP. African Swine Fever Virus Protein-Protein Interaction Prediction. Viruses 2024; 16:1170. [PMID: 39066332 PMCID: PMC11281715 DOI: 10.3390/v16071170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/05/2024] [Accepted: 07/12/2024] [Indexed: 07/28/2024] Open
Abstract
The African swine fever virus (ASFV) is an often deadly disease in swine and poses a threat to swine livestock and swine producers. With its complex genome containing more than 150 coding regions, developing effective vaccines for this virus remains a challenge due to a lack of basic knowledge about viral protein function and protein-protein interactions between viral proteins and between viral and host proteins. In this work, we identified ASFV-ASFV protein-protein interactions (PPIs) using artificial intelligence-powered protein structure prediction tools. We benchmarked our PPI identification workflow on the Vaccinia virus, a widely studied nucleocytoplasmic large DNA virus, and found that it could identify gold-standard PPIs that have been validated in vitro in a genome-wide computational screening. We applied this workflow to more than 18,000 pairwise combinations of ASFV proteins and were able to identify seventeen novel PPIs, many of which have corroborating experimental or bioinformatic evidence for their protein-protein interactions, further validating their relevance. Two protein-protein interactions, I267L and I8L, I267L__I8L, and B175L and DP79L, B175L__DP79L, are novel PPIs involving viral proteins known to modulate host immune response.
Collapse
Affiliation(s)
- Jacob A. Fenster
- Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN 37830, USA;
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Paul A. Azzinaro
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Mark Dinhobl
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Manuel V. Borca
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Edward Spinard
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Douglas P. Gladue
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| |
Collapse
|
9
|
Bonello J, Orengo C. FunPredCATH: An ensemble method for predicting protein function using CATH. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2024; 1872:140985. [PMID: 38122964 DOI: 10.1016/j.bbapap.2023.140985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/05/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023]
Abstract
MOTIVATION The growth of unannotated proteins in UniProt increases at a very high rate every year due to more efficient sequencing methods. However, the experimental annotation of proteins is a lengthy and expensive process. Using computational techniques to narrow the search can speed up the process by providing highly specific Gene Ontology (GO) terms. METHODOLOGY We propose an ensemble approach that combines three generic base predictors that predict Gene Ontology (BP, CC and MF) terms from sequences across different species. We train our models on UniProtGOA annotation data and use the CATH domain resources to identify the protein families. We then calculate a score based on the prevalence of individual GO terms in the functional families that is then used as an indicator of confidence when assigning the GO term to an uncharacterised protein. METHODS In the ensemble, we use a statistics-based method that scores the occurrence of GO terms in a CATH FunFam against a background set of proteins annotated by the same GO term. We also developed a set-based method that uses Set Intersection and Set Union to score the occurrence of GO terms within the same CATH FunFam. Finally, we also use FunFams-Plus, a predictor method developed by the Orengo Group at UCL to predict GO terms for uncharacterised proteins in the CAFA3 challenge. EVALUATION We evaluated the methods against the CAFA3 benchmark and DomFun. We used the Precision, Recall and Fmax metrics and the benchmark datasets that are used in CAFA3 to evaluate our models and compare them to the CAFA3 results. Our results show that FunPredCATH compares well with top CAFA methods in the different ontologies and benchmarks. CONTRIBUTIONS FunPredCATH compares well with other prediction methods on CAFA3, and the ensemble approach outperforms the base methods. We show that non-IEA models obtain higher Fmax scores than the IEA counterparts, while the models including IEA annotations have higher coverage at the expense of a lower Fmax score.
Collapse
Affiliation(s)
- Joseph Bonello
- Department of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom; Department of Computer Information Systems, University of Malta, Faculty of ICT, Msida, MSD 2080, Malta.
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom
| |
Collapse
|
10
|
Thiébaut A, Altenhoff AM, Campli G, Glover N, Dessimoz C, Waterhouse RM. DrosOMA: the Drosophila Orthologous Matrix browser. F1000Res 2024; 12:936. [PMID: 38434623 PMCID: PMC10905159 DOI: 10.12688/f1000research.135250.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2024] [Indexed: 03/05/2024] Open
Abstract
Background Comparative genomic analyses to delineate gene evolutionary histories inform the understanding of organismal biology by characterising gene and gene family origins, trajectories, and dynamics, as well as enabling the tracing of speciation, duplication, and loss events, and facilitating the transfer of gene functional information across species. Genomic data are available for an increasing number of species from the genus Drosophila, however, a dedicated resource exploiting these data to provide the research community with browsable results from genus-wide orthology delineation has been lacking. Methods Using the OMA Orthologous Matrix orthology inference approach and browser deployment framework, we catalogued orthologues across a selected set of Drosophila species with high-quality annotated genomes. We developed and deployed a dedicated instance of the OMA browser to facilitate intuitive exploration, visualisation, and downloading of the genus-wide orthology delineation results. Results DrosOMA - the Drosophila Orthologous Matrix browser, accessible from https://drosoma.dcsr.unil.ch/ - presents the results of orthology delineation for 36 drosophilids from across the genus and four outgroup dipterans. It enables querying and browsing of the orthology data through a feature-rich web interface, with gene-view, orthologous group-view, and genome-view pages, including comprehensive gene name and identifier cross-references together with available functional annotations and protein domain architectures, as well as tools to visualise local and global synteny conservation. Conclusions The DrosOMA browser demonstrates the deployability of the OMA browser framework for building user-friendly orthology databases with dense sampling of a selected taxonomic group. It provides the Drosophila research community with a tailored resource of browsable results from genus-wide orthology delineation.
Collapse
Affiliation(s)
- Antonin Thiébaut
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Adrian M. Altenhoff
- Department of Computer Science, SIB Swiss Institute of Bioinformatics, ETH Zurich, Zurich, Switzerland
| | - Giulia Campli
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Natasha Glover
- Department of Computational Biology, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Robert M. Waterhouse
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
11
|
Carhuaricra-Huaman D, Setubal JC. Protein-Coding Gene Families in Prokaryote Genome Comparisons. Methods Mol Biol 2024; 2802:33-55. [PMID: 38819555 DOI: 10.1007/978-1-0716-3838-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
The identification of orthologous genes is relevant for comparative genomics, phylogenetic analysis, and functional annotation. There are many computational tools for the prediction of orthologous groups as well as web-based resources that offer orthology datasets for download and online analysis. This chapter presents a simple and practical guide to the process of orthologous group prediction, using a dataset of 10 prokaryotic proteomes as example. The orthology methods covered are OrthoMCL, COGtriangles, OrthoFinder2, and OMA. The authors compare the number of orthologous groups predicted by these various methods, and present a brief workflow for the functional annotation and reconstruction of phylogenies from inferred single-copy orthologous genes. The chapter also demonstrates how to explore two orthology databases: eggNOG6 and OrthoDB.
Collapse
Affiliation(s)
- Dennis Carhuaricra-Huaman
- Programa de Pós-Graduação Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil
- Research Group in Biotechnology Applied to Animal Health, Production and Conservation (SANIGEN), Laboratory of Biology and Molecular Genetics, Faculty of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
12
|
Lyubetsky VA, Rubanov LI, Tereshina MB, Ivanova AS, Araslanova KR, Uroshlev LA, Goremykina GI, Yang JR, Kanovei VG, Zverkov OA, Shitikov AD, Korotkova DD, Zaraisky AG. Wide-scale identification of novel/eliminated genes responsible for evolutionary transformations. Biol Direct 2023; 18:45. [PMID: 37568147 PMCID: PMC10416458 DOI: 10.1186/s13062-023-00405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 08/07/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND It is generally accepted that most evolutionary transformations at the phenotype level are associated either with rearrangements of genomic regulatory elements, which control the activity of gene networks, or with changes in the amino acid contents of proteins. Recently, evidence has accumulated that significant evolutionary transformations could also be associated with the loss/emergence of whole genes. The targeted identification of such genes is a challenging problem for both bioinformatics and evo-devo research. RESULTS To solve this problem we propose the WINEGRET method, named after the first letters of the title. Its main idea is to search for genes that satisfy two requirements: first, the desired genes were lost/emerged at the same evolutionary stage at which the phenotypic trait of interest was lost/emerged, and second, the expression of these genes changes significantly during the development of the trait of interest in the model organism. To verify the first requirement, we do not use existing databases of orthologs, but rely purely on gene homology and local synteny by using some novel quickly computable conditions. Genes satisfying the second requirement are found by deep RNA sequencing. As a proof of principle, we used our method to find genes absent in extant amniotes (reptiles, birds, mammals) but present in anamniotes (fish and amphibians), in which these genes are involved in the regeneration of large body appendages. As a result, 57 genes were identified. For three of them, c-c motif chemokine 4, eotaxin-like, and a previously unknown gene called here sod4, essential roles for tail regeneration were demonstrated. Noteworthy, we established that the latter gene belongs to a novel family of Cu/Zn-superoxide dismutases lost by amniotes, SOD4. CONCLUSIONS We present a method for targeted identification of genes whose loss/emergence in evolution could be associated with the loss/emergence of a phenotypic trait of interest. In a proof-of-principle study, we identified genes absent in amniotes that participate in body appendage regeneration in anamniotes. Our method provides a wide range of opportunities for studying the relationship between the loss/emergence of phenotypic traits and the loss/emergence of specific genes in evolution.
Collapse
Affiliation(s)
- Vassily A Lyubetsky
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
- Department of Mechanics and Mathematics, Lomonosov Moscow State University, Kolmogorova Str., 1, Moscow, Russia, 119234
| | - Lev I Rubanov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Maria B Tereshina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anastasiya S Ivanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, USA
| | - Karina R Araslanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Leonid A Uroshlev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32, Vavilova Str., Moscow, Russia, 119991
| | - Galina I Goremykina
- Plekhanov Russian University of Economics, Stremyanny Lane 36, Moscow, Russia
| | - Jian-Rong Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Vladimir G Kanovei
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Oleg A Zverkov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Alexander D Shitikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Daria D Korotkova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Global Health Institute, School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Andrey G Zaraisky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997.
- Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
13
|
Imbert B, Kreplak J, Flores RG, Aubert G, Burstin J, Tayeh N. Development of a knowledge graph framework to ease and empower translational approaches in plant research: a use-case on grain legumes. Front Artif Intell 2023; 6:1191122. [PMID: 37601035 PMCID: PMC10435283 DOI: 10.3389/frai.2023.1191122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for different traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo4j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo4j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities offered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes.
Collapse
Affiliation(s)
- Baptiste Imbert
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Jonathan Kreplak
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Raphaël-Gauthier Flores
- Université Paris-Saclay, INRAE, URGI, Versailles, France
- Université Paris-Saclay, INRAE, BioinfOmics, Plant Bioinformatics Facility, Versailles, France
| | - Grégoire Aubert
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Judith Burstin
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Nadim Tayeh
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| |
Collapse
|
14
|
Langschied F, Leisegang MS, Brandes RP, Ebersberger I. ncOrtho: efficient and reliable identification of miRNA orthologs. Nucleic Acids Res 2023; 51:e71. [PMID: 37260093 PMCID: PMC10359484 DOI: 10.1093/nar/gkad467] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators that finetune gene expression via translational repression or degradation of their target mRNAs. Despite their functional relevance, frameworks for the scalable and accurate detection of miRNA orthologs are missing. Consequently, there is still no comprehensive picture of how miRNAs and their associated regulatory networks have evolved. Here we present ncOrtho, a synteny informed pipeline for the targeted search of miRNA orthologs in unannotated genome sequences. ncOrtho matches miRNA annotations from multi-tissue transcriptomes in precision, while scaling to the analysis of hundreds of custom-selected species. The presence-absence pattern of orthologs to 266 human miRNA families across 402 vertebrate species reveals four bursts of miRNA acquisition, of which the most recent event occurred in the last common ancestor of higher primates. miRNA families are rarely modified or lost, but notable exceptions for both events exist. miRNA co-ortholog numbers faithfully indicate lineage-specific whole genome duplications, and miRNAs are powerful markers for phylogenomic analyses. Their exceptionally low genetic diversity makes them suitable to resolve clades where the phylogenetic signal is blurred by incomplete lineage sorting of ancestral alleles. In summary, ncOrtho allows to routinely consider miRNAs in evolutionary analyses that were thus far reserved to protein-coding genes.
Collapse
Affiliation(s)
- Felix Langschied
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Matthias S Leisegang
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ralf P Brandes
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
15
|
Sun J, Lu F, Luo Y, Bie L, Xu L, Wang Y. OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res 2023:7146343. [PMID: 37114999 DOI: 10.1093/nar/gkad313] [Citation(s) in RCA: 177] [Impact Index Per Article: 88.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/07/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
Advancements in comparative genomics research have led to a growing interest in studying species evolution and genetic diversity. To facilitate this research, OrthoVenn3 has been developed as a powerful, web-based tool that enables users to efficiently identify and annotate orthologous clusters and infer phylogenetic relationships across a range of species. The latest upgrade of OrthoVenn includes several important new features, including enhanced orthologous cluster identification accuracy, improved visualization capabilities for numerous sets of data, and wrapped phylogenetic analysis. Furthermore, OrthoVenn3 now provides gene family contraction and expansion analysis to support researchers better understanding the evolutionary history of gene families, as well as collinearity analysis to detect conserved and variable genomic structures. With its intuitive user interface and robust functionality, OrthoVenn3 is a valuable resource for comparative genomics research. The tool is freely accessible at https://orthovenn3.bioinfotoolkits.net.
Collapse
Affiliation(s)
- Jiahe Sun
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Fang Lu
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Yongjiang Luo
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Lingzi Bie
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Ling Xu
- State Key Laboratory of Plant Environmental Resilience, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yi Wang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| |
Collapse
|
16
|
Vinodh Kumar PN, Mallikarjuna MG, Jha SK, Mahato A, Lal SK, K R Y, Lohithaswa HC, Chinnusamy V. Unravelling structural, functional, evolutionary and genetic basis of SWEET transporters regulating abiotic stress tolerance in maize. Int J Biol Macromol 2023; 229:539-560. [PMID: 36603713 DOI: 10.1016/j.ijbiomac.2022.12.326] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/11/2022] [Accepted: 12/28/2022] [Indexed: 01/04/2023]
Abstract
Sugars Will Eventually be Exported Transporters (SWEETs) are the novel sugar transporters widely distributed among living systems. SWEETs play a crucial role in various bio-physiological processes, viz., plant developmental, nectar secretion, pollen development, and regulation of biotic and abiotic stresses, in addition to their prime sugar-transporting activity. Thus, in-depth structural, evolutionary, and functional characterization of maize SWEET transporters was performed for their utility in maize improvement. The mining of SWEET genes in the latest maize genome release (v.5) showed an uneven distribution of 20 ZmSWEETs. The comprehensive structural analyses and docking of ZmSWEETs with four sugars, viz., fructose, galactose, glucose, and sucrose, revealed frequent amino acid residues forming hydrogen (asparagine, valine, serine) and hydrophobic (tryptophan, glycine, and phenylalanine) interactions. Evolutionary analyses of SWEETs showed a mixed lineage with 50-100 % commonality of ortho-groups and -sequences evolved under strong purifying selection (Ka/Ks < 0.5). The duplication analysis showed non-functionalization (ZmSWEET18 in B73) and neo- and sub-functionalization (ZmSWEET3, ZmSWEET6, ZmSWEET9, ZmSWEET19, and ZmSWEET20) events in maize. Functional analyses of ZmSWEET genes through co-expression, in silico expression and qRT-PCR assays showed the relevance of ZmSWEETs expression in regulating drought, heat, and waterlogging stress tolerances in maize. The first ever ZmSWEET-regulatory network revealed 286 direct (ZmSWEET-TF: 140 ZmSWEET-miRNA: 146) and 1226 indirect (TF-TF: 597; TF-miRNA: 629) edges. The present investigation has given new insights into the complex transcriptional and post-transcriptional regulation and the regulatory and functional relevance of ZmSWEETs in assigning stress tolerance in maize.
Collapse
Affiliation(s)
- P N Vinodh Kumar
- Division of Genetics, ICAR - Indian Agricultural Research Institute, New Delhi 110012, India; ICAR - Indian Agricultural Research Institute, Jharkhand, India
| | | | - Shailendra Kumar Jha
- Division of Genetics, ICAR - Indian Agricultural Research Institute, New Delhi 110012, India
| | - Anima Mahato
- ICAR - Indian Agricultural Research Institute, Jharkhand, India
| | - Shambhu Krishan Lal
- School of Genetic Engineering, ICAR - Indian Institute of Agricultural Biotechnology, Ranchi 834003, India
| | - Yathish K R
- Winter Nursery Centre, ICAR-Indian Institute of Maize Research, Hyderabad, India
| | | | - Viswanathan Chinnusamy
- Division of Plant Physiology, ICAR- Indian Agricultural Research Institute, New Delhi 110012, India
| |
Collapse
|
17
|
Hernández-Plaza A, Szklarczyk D, Botas J, Cantalapiedra C, Giner-Lamia J, Mende DR, Kirsch R, Rattei T, Letunic I, Jensen L, Bork P, von Mering C, Huerta-Cepas J. eggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res 2023; 51:D389-D394. [PMID: 36399505 PMCID: PMC9825578 DOI: 10.1093/nar/gkac1022] [Citation(s) in RCA: 107] [Impact Index Per Article: 53.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/17/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open
Abstract
The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 535 reference species, expands functional annotations, and implements new functionality. In total, eggNOG 6.0 provides a hierarchy of over 17M orthologous groups (OGs) computed at 1601 taxonomic levels, spanning 10 756 bacterial, 457 archaeal and 1322 eukaryotic organisms. OGs have been thoroughly annotated using recent knowledge from functional databases, including KEGG, Gene Ontology, UniProtKB, BiGG, CAZy, CARD, PFAM and SMART. eggNOG also offers phylogenetic trees for all OGs, maximising utility and versatility for end users while allowing researchers to investigate the evolutionary history of speciation and duplication events as well as the phylogenetic distribution of functional terms within each OG. Furthermore, the eggNOG 6.0 website contains new functionality to mine orthology and functional data with ease, including the possibility of generating phylogenetic profiles for multiple OGs across species or identifying single-copy OGs at custom taxonomic levels. eggNOG 6.0 is available at http://eggnog6.embl.de.
Collapse
Affiliation(s)
- Ana Hernández-Plaza
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Carlos P Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid 28040, Spain
| | - Daniel R Mende
- Department of Medical Microbiology, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Thomas Rattei
- University of Vienna, Centre for Microbiology and Environmental Systems Science, Djerassiplatz 11030, Vienna, Austria
| | - Ivica Letunic
- Biobyte solutions GmbH, Bothestr. 142, 69117 Heidelberg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- Yonsei Frontier Lab (YFL), Yonsei University, 03722 Seoul, South Korea
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
18
|
Kuznetsov D, Tegenfeldt F, Manni M, Seppey M, Berkeley M, Kriventseva E, Zdobnov EM. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 2022; 51:D445-D451. [PMID: 36350662 PMCID: PMC9825584 DOI: 10.1093/nar/gkac998] [Citation(s) in RCA: 187] [Impact Index Per Article: 62.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/15/2022] [Accepted: 10/26/2022] [Indexed: 11/10/2022] Open
Abstract
OrthoDB provides evolutionary and functional annotations of genes in a diverse sampling of eukaryotes, prokaryotes, and viruses. Genomics continues to accelerate our exploration of gene diversity and orthology is the most precise way of bridging gene functional knowledge with the rapidly expanding universe of genomic sequences. OrthoDB samples the most diverse organisms with the best quality genomics data to provide the leading coverage of species diversity. This update of the underlying data to over 18 000 prokaryotes and almost 2000 eukaryotes with over 100 million genes propels the coverage to another level. This achievement also demonstrates the scalability of the underlying OrthoLoger software for delineation of orthologs, freely available from https://orthologer.ezlab.org. In addition to the ab-initio computations of gene orthology used for the OrthoDB release, the OrthoLoger software allows mapping of novel gene sets to precomputed orthologs and thereby links to their annotations. The LEMMI-style benchmarking of OrthoLoger ensures its state-of-the-art performance and is available from https://lemortho.ezlab.org. The OrthoDB web interface has been further developed to include a pairwise orthology view from any gene to any other sampled species. OrthoDB-computed evolutionary annotations as well as extensively collated functional annotations can be accessed via REST API or SPARQL/RDF, downloaded or browsed online from https://www.orthodb.org.
Collapse
Affiliation(s)
| | | | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mathieu Seppey
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Matthew Berkeley
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | | | - Evgeny M Zdobnov
- To whom correspondence should be addressed. Tel: +41 22 379 59 73;
| |
Collapse
|
19
|
Duan G, Wu G, Chen X, Tian D, Li Z, Sun Y, Du Z, Hao L, Song S, Gao Y, Xiao J, Zhang Z, Bao Y, Tang B, Zhao W. HGD: an integrated homologous gene database across multiple species. Nucleic Acids Res 2022; 51:D994-D1002. [PMID: 36318261 PMCID: PMC9825607 DOI: 10.1093/nar/gkac970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/28/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.
Collapse
Affiliation(s)
| | | | - Xiaoning Chen
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongmei Tian
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhaohua Li
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanling Sun
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhenglin Du
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Lili Hao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Shuhui Song
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuan Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiming Bao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bixia Tang
- Correspondence may also be addressed to Bixia Tang.
| | - Wenming Zhao
- To whom correspondence should be addressed. Tel: +86 1084097636; Fax: +86 1084097720;
| |
Collapse
|
20
|
Colbourne JK, Shaw JR, Sostare E, Rivetti C, Derelle R, Barnett R, Campos B, LaLone C, Viant MR, Hodges G. Toxicity by descent: A comparative approach for chemical hazard assessment. ENVIRONMENTAL ADVANCES 2022; 9:100287. [PMID: 39228468 PMCID: PMC11370884 DOI: 10.1016/j.envadv.2022.100287] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Toxicology is traditionally divided between human and eco-toxicology. In the shared pursuit of environmental health, this separation does not account for discoveries made in the comparative studies of animal genomes. Here, we provide evidence on the feasibility of understanding the health impact of chemicals on all animals, including ecological keystone species and humans, based on a significant number of conserved genes and their functional associations to health-related outcomes across much of animal diversity. We test four conditions to understand the value of comparative genomics data to inform mechanism-based human and environmental hazard assessment: (1) genes that are most fundamental for health evolved early during animal evolution; (2) the molecular functions of pathways are better conserved among distantly related species than the individual genes that are members of these pathways; (3) the most conserved pathways among animals are those that cause adverse health outcomes when disrupted; (4) gene sets that serve as molecular signatures of biological processes or disease-states are largely enriched by evolutionarily conserved genes across the animal phylogeny. The concept of homology is applied in a comparative analysis of gene families and pathways among invertebrate and vertebrate species compared with humans. Results show that over 70% of gene families associated with disease are shared among the greatest variety of animal species through evolution. Pathway conservation between invertebrates and humans is based on the degree of conservation within vertebrates and the number of interacting genes within the human network. Human gene sets that already serve as biomarkers are enriched by evolutionarily conserved genes across the animal phylogeny. By implementing a comparative method for chemical hazard assessment, human and eco-toxicology converge towards a more holistic and mechanistic understanding of toxicity disrupting biological processes that are important for health and shared among animals (including humans).
Collapse
Affiliation(s)
- John K. Colbourne
- Michabo Health Science Ltd, Coventry CV1 2NT, UK
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, UK
| | - Joseph R. Shaw
- O’Neill School of Public and Environmental Affairs, Indiana University, Bloomington 47405, USA
| | | | - Claudia Rivetti
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| | - Romain Derelle
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, UK
| | | | - Bruno Campos
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| | - Carlie LaLone
- US Environmental Protection Agency, Duluth 55804, USA
| | - Mark R. Viant
- Michabo Health Science Ltd, Coventry CV1 2NT, UK
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, UK
| | - Geoff Hodges
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| |
Collapse
|
21
|
Nevers Y, Jones TEM, Jyothi D, Yates B, Ferret M, Portell-Silva L, Codo L, Cosentino S, Marcet-Houben M, Vlasova A, Poidevin L, Kress A, Hickman M, Persson E, Piližota I, Guijarro-Clarke C, Iwasaki W, Lecompte O, Sonnhammer E, Roos DS, Gabaldón T, Thybert D, Thomas PD, Hu Y, Emms DM, Bruford E, Capella-Gutierrez S, Martin MJ, Dessimoz C, Altenhoff A. The Quest for Orthologs orthology benchmark service in 2022. Nucleic Acids Res 2022; 50:W623-W632. [PMID: 35552456 PMCID: PMC9252809 DOI: 10.1093/nar/gkac330] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/07/2022] [Accepted: 04/30/2022] [Indexed: 11/15/2022] Open
Abstract
The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.
Collapse
Affiliation(s)
- Yannis Nevers
- To whom correspondence should be addressed. Tel: +41 21 692 5449;
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Dushyanth Jyothi
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Bethan Yates
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Meritxell Ferret
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laura Portell-Silva
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laia Codo
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan
| | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Anna Vlasova
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Laetitia Poidevin
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Mark Hickman
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emma Persson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Guijarro-Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan,Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain,Centro de Investigaciones Biomédicas en Red de Enfermedades Infecciosas, Barcelona, Spain
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90032, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - David M Emms
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, UK
| | - Elspeth Bruford
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | | | - Maria J Martin
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Department of Computer Science, University College London, London, UK,Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Adrian Altenhoff
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Computer Science Department, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
22
|
Engel SR, Wong ED, Nash RS, Aleksander S, Alexander M, Douglass E, Karra K, Miyasato SR, Simison M, Skrzypek MS, Weng S, Cherry JM. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources. Genetics 2022; 220:iyab224. [PMID: 34897464 PMCID: PMC9209811 DOI: 10.1093/genetics/iyab224] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.
Collapse
Affiliation(s)
- Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Micheal Alexander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Eric Douglass
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Matt Simison
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
23
|
Agapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, et alAgapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Toro S, Tomczuk M, Trovisco V, Tutaj MA, Tutaj M, Urbano JM, Van Auken K, Van Slyke CE, Wang Q, Wang SJ, Weng S, Westerfield M, Williams G, Wilming LG, Wong ED, Wright A, Yook K, Zarowiecki M, Zhou P, Zytkovicz M. Harmonizing model organism data in the Alliance of Genome Resources. Genetics 2022; 220:iyac022. [PMID: 35380658 PMCID: PMC8982023 DOI: 10.1093/genetics/iyac022] [Show More Authors] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/26/2022] [Indexed: 02/06/2023] Open
Abstract
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.
Collapse
|
24
|
Fuentes D, Molina M, Chorostecki U, Capella-Gutiérrez S, Marcet-Houben M, Gabaldón T. PhylomeDB V5: an expanding repository for genome-wide catalogues of annotated gene phylogenies. Nucleic Acids Res 2021; 50:D1062-D1068. [PMID: 34718760 PMCID: PMC8728271 DOI: 10.1093/nar/gkab966] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/02/2021] [Accepted: 10/05/2021] [Indexed: 12/20/2022] Open
Abstract
PhylomeDB is a unique knowledge base providing public access to minable and browsable catalogues of pre-computed genome-wide collections of annotated sequences, alignments and phylogenies (i.e. phylomes) of homologous genes, as well as to their corresponding phylogeny-based orthology and paralogy relationships. In addition, PhylomeDB trees and alignments can be downloaded for further processing to detect and date gene duplication events, infer past events of inter-species hybridization and horizontal gene transfer, as well as to uncover footprints of selection, introgression, gene conversion, or other relevant evolutionary processes in the genes and organisms of interest. Here, we describe the latest evolution of PhylomeDB (version 5). This new version includes a newly implemented web interface and several new functionalities such as optimized searching procedures, the possibility to create user-defined phylome collections, and a fully redesigned data structure. This release also represents a significant core data expansion, with the database providing access to 534 phylomes, comprising over 8 million trees, and homology relationships for genes in over 6000 species. This makes PhylomeDB the largest and most comprehensive public repository of gene phylogenies. PhylomeDB is available at http://www.phylomedb.org.
Collapse
Affiliation(s)
- Diego Fuentes
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | - Manuel Molina
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | - Uciel Chorostecki
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | | | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|