1
|
Brown T, Mishra K, Elewa A, Iarovenko S, Subramanian E, Araus AJ, Petzold A, Fromm B, Friedländer MR, Rikk L, Suzuki M, Suzuki KIT, Hayashi T, Toyoda A, Oliveira CR, Osipova E, Leigh ND, Yun MH, Simon A. Chromosome-scale genome assembly reveals how repeat elements shape non-coding RNA landscapes active during newt limb regeneration. CELL GENOMICS 2025; 5:100761. [PMID: 39874962 PMCID: PMC11872487 DOI: 10.1016/j.xgen.2025.100761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 12/04/2024] [Accepted: 01/03/2025] [Indexed: 01/30/2025]
Abstract
Newts have large genomes harboring many repeat elements. How these elements shape the genome and relate to newts' unique regeneration ability remains unknown. We present here the chromosome-scale assembly of the 20.3 Gb genome of the Iberian ribbed newt, Pleurodeles waltl, with a hitherto unprecedented contiguity and completeness among giant genomes. Utilizing this assembly, we demonstrate conserved synteny as well as genetic rearrangements, such as in the major histocompatibility complex locus. We provide evidence suggesting that intronic repeat elements drive newt-specific circular RNA (circRNA) biogenesis and show their regeneration-specific expression. We also present a comprehensive in-depth annotation and chromosomal mapping of microRNAs, highlighting genomic expansion profiles as well as a distinct regulatory pattern in the regenerating limb. These data reveal links between repeat elements, non-coding RNAs, and adult regeneration and provide key resources for addressing developmental, regenerative, and evolutionary principles.
Collapse
Affiliation(s)
- Thomas Brown
- DRESDEN-concept Genome Center (DcGC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, 01307 Dresden, Germany; Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
| | - Ketan Mishra
- Department of Cell and Molecular Biology, Karolinska Institute, 171 65 Stockholm, Sweden
| | - Ahmed Elewa
- Department of Biology, Augsburg University, Minneapolis, MN 55454, USA
| | - Svetlana Iarovenko
- CRTD Center for Regenerative Therapies Dresden, Technische Universität Dresden, 01307 Dresden, Germany
| | - Elaiyaraja Subramanian
- Department of Cell and Molecular Biology, Karolinska Institute, 171 65 Stockholm, Sweden
| | - Alberto Joven Araus
- Department of Cell and Molecular Biology, Karolinska Institute, 171 65 Stockholm, Sweden
| | - Andreas Petzold
- DRESDEN-concept Genome Center (DcGC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, 01307 Dresden, Germany
| | - Bastian Fromm
- The Arctic University Museum of Norway, UiT - The Arctic University of Norway, 9006 Tromsø, Norway
| | - Marc R Friedländer
- Science for Life Laboratory, Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, 114 18 Stockholm, Sweden
| | - Lennart Rikk
- Molecular Medicine and Gene Therapy, Wallenberg Centre for Molecular Medicine, Lund Stem Cell Center, Lund University, 221 84 Lund, Sweden
| | - Miyuki Suzuki
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Ken-Ichi T Suzuki
- Emerging Model Organisms Facility, Trans-scale Biology Center, National Institute for Basic Biology, Okazaki, Aichi 444-8585, Japan
| | - Toshinori Hayashi
- Program of Biomedical Science, Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8511, Japan; Amphibian Research Center, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8511, Japan
| | - Atsushi Toyoda
- Comparative Genomics Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Shizuoka 411-0801, Japan
| | - Catarina R Oliveira
- CRTD Center for Regenerative Therapies Dresden, Technische Universität Dresden, 01307 Dresden, Germany
| | - Ekaterina Osipova
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325 Frankfurt, Germany
| | - Nicholas D Leigh
- Molecular Medicine and Gene Therapy, Wallenberg Centre for Molecular Medicine, Lund Stem Cell Center, Lund University, 221 84 Lund, Sweden.
| | - Maximina H Yun
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany; CRTD Center for Regenerative Therapies Dresden, Technische Universität Dresden, 01307 Dresden, Germany; Physics of Life Excellence Cluster Dresden, 01307 Dresden, Germany.
| | - András Simon
- Department of Cell and Molecular Biology, Karolinska Institute, 171 65 Stockholm, Sweden.
| |
Collapse
|
2
|
Majidian S, Nevers Y, Yazdizadeh Kharrazi A, Warwick Vesztrocy A, Pascarelli S, Moi D, Glover N, Altenhoff AM, Dessimoz C. Orthology inference at scale with FastOMA. Nat Methods 2025; 22:269-272. [PMID: 39753922 PMCID: PMC11810774 DOI: 10.1038/s41592-024-02552-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 10/29/2024] [Indexed: 02/12/2025]
Abstract
The surge in genome data, with ongoing efforts aiming to sequence 1.5 M eukaryotes in a decade, could revolutionize genomics, revealing the origins, evolution and genetic innovations of biological processes. Yet, traditional genomics methods scale poorly with such large datasets. Here, addressing this, 'FastOMA' provides linear scalability for orthology inference, enabling the processing of thousands of eukaryotic genomes within a day. FastOMA maintains the high accuracy and resolution of the well-established Orthologous Matrix (OMA) approach in benchmarks. FastOMA is available via GitHub at https://github.com/DessimozLab/FastOMA/ .
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Alex Warwick Vesztrocy
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Stefano Pascarelli
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Moi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Adrian M Altenhoff
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
3
|
OMArk, a tool for gene annotation quality control, reveals erroneous gene inference. Nat Biotechnol 2025; 43:40-41. [PMID: 38383604 DOI: 10.1038/s41587-024-02155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
|
4
|
Nevers Y, Warwick Vesztrocy A, Rossier V, Train CM, Altenhoff A, Dessimoz C, Glover NM. Quality assessment of gene repertoire annotations with OMArk. Nat Biotechnol 2025; 43:124-133. [PMID: 38383603 PMCID: PMC11738984 DOI: 10.1038/s41587-024-02147-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Alex Warwick Vesztrocy
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Victor Rossier
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Clément-Marie Train
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Adrian Altenhoff
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Natasha M Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
5
|
Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I. Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biol Evol 2024; 16:evae224. [PMID: 39404012 PMCID: PMC11523110 DOI: 10.1093/gbe/evae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 11/01/2024] Open
Abstract
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.
Collapse
Affiliation(s)
- Felix Langschied
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Salvatore Cosentino
- Department of Integrated Biosciences, The University of Tokyo, 277-0882 Tokyo, Japan
| | - Diego Fuentes-Palacios
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Natasha Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael Hiller
- Department of Comparative Genomics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, Boston, MA 02115, USA
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
| | - Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Wataru Iwasaki
- Department of Integrated Biosciences, University of Tokyo, 277-0882 Tokyo, Japan
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Saioa Manzano-Morales
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | | | - Toni Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ingo Ebersberger
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
| |
Collapse
|
6
|
Rossier V, Train C, Nevers Y, Robinson-Rechavi M, Dessimoz C. Matreex: Compact and Interactive Visualization for Scalable Studies of Large Gene Families. Genome Biol Evol 2024; 16:evae100. [PMID: 38742690 PMCID: PMC11149776 DOI: 10.1093/gbe/evae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/17/2024] [Accepted: 05/03/2024] [Indexed: 05/16/2024] Open
Abstract
Studying gene family evolution strongly benefits from insightful visualizations. However, the ever-growing number of sequenced genomes is leading to increasingly larger gene families, which challenges existing gene tree visualizations. Indeed, most of them present users with a dilemma: display complete but intractable gene trees, or collapse subtrees, thereby hiding their children's information. Here, we introduce Matreex, a new dynamic tool to scale up the visualization of gene families. Matreex's key idea is to use "phylogenetic" profiles, which are dense representations of gene repertoires, to minimize the information loss when collapsing subtrees. We illustrate Matreex's usefulness with three biological applications. First, we demonstrate on the MutS family the power of combining gene trees and phylogenetic profiles to delve into precise evolutionary analyses of large multicopy gene families. Second, by displaying 22 intraflagellar transport gene families across 622 species cumulating 5,500 representatives, we show how Matreex can be used to automate large-scale analyses of gene presence-absence. Notably, we report for the first time the complete loss of intraflagellar transport in the myxozoan Thelohanellus kitauei. Finally, using the textbook example of visual opsins, we show Matreex's potential to create easily interpretable figures for teaching and outreach. Matreex is available from the Python Package Index (pip install Matreex) with the source code and documentation available at https://github.com/DessimozLab/matreex.
Collapse
Affiliation(s)
- Victor Rossier
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Clement Train
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
| |
Collapse
|
7
|
Altenhoff AM, Warwick Vesztrocy A, Bernard C, Train CM, Nicheperovich A, Prieto Baños S, Julca I, Moi D, Nevers Y, Majidian S, Dessimoz C, Glover NM. OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem. Nucleic Acids Res 2024; 52:D513-D521. [PMID: 37962356 PMCID: PMC10767875 DOI: 10.1093/nar/gkad1020] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Charles Bernard
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Clement-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alina Nicheperovich
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Silvia Prieto Baños
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Irene Julca
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|