1
|
Percudani R, De Rito C. Predicting Protein Function in the AI and Big Data Era. Biochemistry 2025. [PMID: 40380914 DOI: 10.1021/acs.biochem.5c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2025]
Abstract
It is an exciting time for researchers working to link proteins to their functions. Most techniques for extracting functional information from genomic sequences were developed several years ago, with major progress driven by the availability of big data. Now, groundbreaking advances in deep-learning and AI-based methods have enriched protein databases with three-dimensional information and offer the potential to predict biochemical properties and biomolecular interactions, providing key functional insights. This progress is expected to increase the proportion of functionally bright proteins in databases and deepen our understanding of life at the molecular level.
Collapse
Affiliation(s)
- Riccardo Percudani
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy
| | - Carlo De Rito
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy
| |
Collapse
|
2
|
Aguirre-Carvajal K, Cárdenas S, Munteanu CR, Armijos-Jaramillo V. Rampant Interkingdom Horizontal Gene Transfer in Pezizomycotina? An Updated Inspection of Anomalous Phylogenies. Int J Mol Sci 2025; 26:1795. [PMID: 40076423 PMCID: PMC11898892 DOI: 10.3390/ijms26051795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/11/2025] [Accepted: 02/18/2025] [Indexed: 03/14/2025] Open
Abstract
Horizontal gene transfer (HGT) is a significant source of diversity in prokaryotes and a key factor in their genome evolution. Although similar processes have been postulated for eukaryotes, the validity of HGT's impact remains contested, particularly between long-distance-related organisms like those from different kingdoms. Among eukaryotes, the fungal subphylum Pezizomycotina has been frequently cited in the literature for experiencing HGT events, with over 600 publications on the subject. The proteomes of 421 Pezizomycotina species were meticulously examined to identify potential instances of interkingdom HGT. Furthermore, the phylogenies of over 275 HGT candidates previously reported were revisited. Manual scrutiny of 521 anomalous phylogenies revealed that only 1.5% display patterns indicative of interkingdom HGT. Moreover, novel interkingdom HGT searches within Pezizomycotina yielded few new contenders, casting doubt on the prevalence of such events within this subphylum. Although the detailed examination of phylogenies suggested interkingdom HGT, the evidence for lateral gene transfer is not conclusive. The findings suggest that expanding the number of homologous sequences could uncover vertical inheritance patterns that have been misclassified as HGT. Consequently, this research supports the notion that interkingdom HGT may be an extraordinary occurrence rather than a significant evolutionary driver in eukaryotic genomes.
Collapse
Affiliation(s)
- Kevin Aguirre-Carvajal
- Computer Science Faculty, University of A Coruna, CITIC-Research Center of Information and Communication Technologies, 15071 A Coruña, Spain; (K.A.-C.); (C.R.M.)
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito 170513, Ecuador
| | - Sebastián Cárdenas
- Carrera de Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito 170513, Ecuador;
| | - Cristian R. Munteanu
- Computer Science Faculty, University of A Coruna, CITIC-Research Center of Information and Communication Technologies, 15071 A Coruña, Spain; (K.A.-C.); (C.R.M.)
| | - Vinicio Armijos-Jaramillo
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito 170513, Ecuador
- Carrera de Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito 170513, Ecuador;
| |
Collapse
|
3
|
Roberts NG, Gilmore MJ, Struck TH, Kocot KM. Multiple Displacement Amplification Facilitates SMRT Sequencing of Microscopic Animals and the Genome of the Gastrotrich Lepidodermella squamata (Dujardin 1841). Genome Biol Evol 2024; 16:evae254. [PMID: 39590608 DOI: 10.1093/gbe/evae254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 11/28/2024] Open
Abstract
Obtaining adequate DNA for long-read genome sequencing remains a roadblock to producing contiguous genomes from small-bodied organisms, hindering understanding of phylogenetic relationships and genome evolution. Multiple displacement amplification leverages Phi29 DNA polymerase to produce micrograms of DNA from picograms of input. However, multiple displacement amplification's inherent biases in amplification related to guanine and cytosine (GC) content, repeat content and chimera production are a problem for long-read genome assembly, which has been little investigated. We explored the utility of multiple displacement amplification for generating template DNA for High Fidelity (HiFi) sequencing directly from living cells of Caenorhabditis elegans (Nematoda) and Lepidodermella squamata (Gastrotricha) containing one order of magnitude less DNA than required for the PacBio Ultra-Low DNA Input Workflow. High Fidelity sequencing of libraries prepared from multiple displacement amplification products resulted in highly contiguous and complete genomes for both C. elegans (102 Mbp assembly; 336 contigs; N50 = 868 kbp; L50 = 39; BUSCO_nematoda_nucleotide: S:96.1%, D:2.8%) and L. squamata (122 Mbp assembly; 157 contigs; N50 = 3.9 Mbp; L50 = 13; BUSCO_metazoa_nucleotide: S:80.8%, D:2.8%). Coverage uniformity for reads from multiple displacement amplification DNA (Gini Index: 0.14, normalized mean across all 100 kbp blocks: 0.49) and reads from pooled nematode DNA (Gini Index: 0.16, normalized mean across all 100 kbp blocks: 0.49) proved similar. Using this approach, we sequenced the genome of the microscopic invertebrate L. squamata (Gastrotricha), the first of its phylum. Using the newly sequenced genome, we infer Gastrotricha's long-debated phylogenetic position as the sister taxon of Platyhelminthes and conduct a comparative analysis of the Hox cluster.
Collapse
Affiliation(s)
- Nickellaus G Roberts
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA
| | - Michael J Gilmore
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA
| | | | - Kevin M Kocot
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA
- Alabama Museum of Natural History, The University of Alabama, Tuscaloosa, Alabama, USA
| |
Collapse
|
4
|
Rönkä K, Eroukhmanoff F, Kulmuni J, Nouhaud P, Thorogood R. Beyond genes-for-behaviour: The potential for genomics to resolve long-standing questions in avian brood parasitism. Ecol Evol 2024; 14:e70335. [PMID: 39575141 PMCID: PMC11581780 DOI: 10.1002/ece3.70335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 08/28/2024] [Accepted: 09/07/2024] [Indexed: 11/24/2024] Open
Abstract
Behavioural ecology by definition of its founding 'Tinbergian framework' is an integrative field, however, it lags behind in incorporating genomic methods. 'Finding the gene/s for a behaviour' is still rarely feasible or cost-effective in the wild but as we show here, genomic data can be used to address broader questions. Here we use avian brood parasitism, a model system in behavioural ecology as a case study to highlight how behavioural ecologists could use the full potential of state-of-the-art genomic tools. Brood parasite-host interactions are one of the most easily observable and amenable natural laboratories of antagonistic coevolution, and as such have intrigued evolutionary biologists for decades. Using worked examples, we demonstrate how genomic data can be used to study the causes and mechanisms of (co)evolutionary adaptation and answer three key questions for the field: (i) Where and when should brood parasitism evolve?, (ii) When and how should hosts defend?, and (iii) Will coevolution persist with ecological change? In doing so, we discuss how behavioural and molecular ecologists can collaborate to integrate Tinbergen's questions and achieve the coherent science that he promoted to solve the mysteries of nature.
Collapse
Affiliation(s)
- Katja Rönkä
- HiLIFE Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
- Research Programme in Organismal & Evolutionary Biology, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
| | - Fabrice Eroukhmanoff
- Centre for Ecological and Evolutionary Synthesis, Department of BiologyUniversity of OsloOsloNorway
| | - Jonna Kulmuni
- Research Programme in Organismal & Evolutionary Biology, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
- Department of Evolution and Population Biology, Institute for Biodiversity and Ecosystem DynamicsUniversity of AmsterdamAmsterdamThe Netherlands
| | - Pierre Nouhaud
- Research Programme in Organismal & Evolutionary Biology, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
- CBGP, INRAE, CIRAD, IRD, Montpellier SupAgroUniv MontpellierMontpellierFrance
| | - Rose Thorogood
- HiLIFE Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
- Research Programme in Organismal & Evolutionary Biology, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
5
|
Perez-Enriquez R, Juárez OE, Galindo-Torres P, Vargas-Aguilar AL, Llera-Herrera R. Improved genome assembly of the whiteleg shrimp Penaeus (Litopenaeus) vannamei using long- and short-read sequences from public databases. J Hered 2024; 115:302-310. [PMID: 38451162 DOI: 10.1093/jhered/esae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 02/28/2024] [Indexed: 03/08/2024] Open
Abstract
The Pacific whiteleg shrimp Penaeus (Litopenaeus) vannamei is a highly relevant species for the world's aquaculture development, for which an incomplete genome is available in public databases. In this work, PacBio long-reads from 14 publicly available genomic libraries (131.2 Gb) were mined to improve the reference genome assembly. The libraries were assembled, polished using Illumina short-reads, and scaffolded with P. vannamei, Feneropenaeus chinensis, and Penaeus monodon genomes. The reference-guided assembly, organized into 44 pseudo-chromosomes and 15,682 scaffolds, showed an improvement from previous reference genomes with a genome size of 2.055 Gb, N50 of 40.14 Mb, L50 of 21, and the longest scaffold of 65.79 Mb. Most orthologous genes (92.6%) of the Arthropoda_odb10 database were detected as "complete," and BRAKER predicted 21,816 gene models; from these, we detected 1,814 single-copy orthologues conserved across the genomic references for Marsupenaeus japonicus, F. chinensis, and P. monodon. Transcriptomic-assembly data aligned in more than 99% to the new reference-guided assembly. The collinearity analysis of the assembled pseudo-chromosomes against the P. vannamei and P. monodon reference genomes showed high conservation in different sets of pseudo-chromosomes. In addition, more than 21,000 publicly available genetic marker sequences were mapped to single-site positions. This new assembly represents a step forward to previously reported P. vannamei assemblies. It will be helpful as a reference genome for future studies on the evolutionary history of the species, the genetic architecture of physiological and sex-determination traits, and the analysis of the changes in genetic diversity and composition of cultivated stocks.
Collapse
Affiliation(s)
- Ricardo Perez-Enriquez
- Aquaculture Program, Centro de Investigaciones Biológicas del Noroeste, S.C., La Paz, B.C.S., Mexico
| | - Oscar E Juárez
- Aquaculture Program, Centro de Investigaciones Biológicas del Noroeste, S.C., La Paz, B.C.S., Mexico
- Dirección de Investigación en Acuacultura, Programa de Recursos Genéticos, Instituto Mexicano de Investigación en Pesca y Acuacultura Sustentables, Coyoacán, Ciudad de México, Mexico
| | - Pavel Galindo-Torres
- Aquaculture Program, Centro de Investigaciones Biológicas del Noroeste, S.C., La Paz, B.C.S., Mexico
| | - Ana Luisa Vargas-Aguilar
- Aquaculture Program, Centro de Investigaciones Biológicas del Noroeste, S.C., La Paz, B.C.S., Mexico
- Departamento de Recursos del Mar, Centro de Investigación y de Estudios Avanzados, Unidad Mérida, Mérida, Yucatán, Mexico
| | - Raúl Llera-Herrera
- Functional Genomics Laboratory, Unidad Académica Mazatlán, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, Mazatlán, Sinaloa, Mexico
| |
Collapse
|
6
|
de Potter B, Vallee I, Camacho N, Filipe Costa Póvoas L, Bonsembiante A, Pons i Pons A, Eckhard U, Gomis-Rüth FX, Yang XL, Schimmel P, Kuhle B, Ribas de Pouplana L. Domain collapse and active site ablation generate a widespread animal mitochondrial seryl-tRNA synthetase. Nucleic Acids Res 2023; 51:10001-10010. [PMID: 37638745 PMCID: PMC10570016 DOI: 10.1093/nar/gkad696] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/03/2023] [Accepted: 08/14/2023] [Indexed: 08/29/2023] Open
Abstract
Through their aminoacylation reactions, aminoacyl tRNA-synthetases (aaRS) establish the rules of the genetic code throughout all of nature. During their long evolution in eukaryotes, additional domains and splice variants were added to what is commonly a homodimeric or monomeric structure. These changes confer orthogonal functions in cellular activities that have recently been uncovered. An unusual exception to the familiar architecture of aaRSs is the heterodimeric metazoan mitochondrial SerRS. In contrast to domain additions or alternative splicing, here we show that heterodimeric metazoan mitochondrial SerRS arose from its homodimeric ancestor not by domain additions, but rather by collapse of an entire domain (in one subunit) and an active site ablation (in the other). The collapse/ablation retains aminoacylation activity while creating a new surface, which is necessary for its orthogonal function. The results highlight a new paradigm for repurposing a member of the ancient tRNA synthetase family.
Collapse
Affiliation(s)
- Bastiaan de Potter
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Utrecht University Faculty of Science, Department of Biology, Theoretical Biology and Bioinformatics Utrecht, Utrecht, The Netherlands
| | - Ingrid Vallee
- The Scripps Research Institute, Department of Molecular Medicine La Jolla, CA, USA
| | - Noelia Camacho
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Luís Filipe Costa Póvoas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Aureliano Bonsembiante
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Alba Pons i Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Ulrich Eckhard
- Molecular Biology Institute of Barcelona, Department of Structural Biology, Barcelona, Catalunya, Spain
| | | | - Xiang-Lei Yang
- The Scripps Research Institute, Department of Molecular Medicine La Jolla, CA, USA
| | - Paul Schimmel
- The Scripps Research Institute, Department of Molecular Medicine La Jolla, CA, USA
| | - Bernhard Kuhle
- The Scripps Research Institute, Department of Molecular Medicine La Jolla, CA, USA
| | - Lluís Ribas de Pouplana
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- ICREA, Catalan Institution for Research and Advanced Studies Barcelona, Catalonia, Spain
| |
Collapse
|
7
|
Lyubetsky VA, Rubanov LI, Tereshina MB, Ivanova AS, Araslanova KR, Uroshlev LA, Goremykina GI, Yang JR, Kanovei VG, Zverkov OA, Shitikov AD, Korotkova DD, Zaraisky AG. Wide-scale identification of novel/eliminated genes responsible for evolutionary transformations. Biol Direct 2023; 18:45. [PMID: 37568147 PMCID: PMC10416458 DOI: 10.1186/s13062-023-00405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 08/07/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND It is generally accepted that most evolutionary transformations at the phenotype level are associated either with rearrangements of genomic regulatory elements, which control the activity of gene networks, or with changes in the amino acid contents of proteins. Recently, evidence has accumulated that significant evolutionary transformations could also be associated with the loss/emergence of whole genes. The targeted identification of such genes is a challenging problem for both bioinformatics and evo-devo research. RESULTS To solve this problem we propose the WINEGRET method, named after the first letters of the title. Its main idea is to search for genes that satisfy two requirements: first, the desired genes were lost/emerged at the same evolutionary stage at which the phenotypic trait of interest was lost/emerged, and second, the expression of these genes changes significantly during the development of the trait of interest in the model organism. To verify the first requirement, we do not use existing databases of orthologs, but rely purely on gene homology and local synteny by using some novel quickly computable conditions. Genes satisfying the second requirement are found by deep RNA sequencing. As a proof of principle, we used our method to find genes absent in extant amniotes (reptiles, birds, mammals) but present in anamniotes (fish and amphibians), in which these genes are involved in the regeneration of large body appendages. As a result, 57 genes were identified. For three of them, c-c motif chemokine 4, eotaxin-like, and a previously unknown gene called here sod4, essential roles for tail regeneration were demonstrated. Noteworthy, we established that the latter gene belongs to a novel family of Cu/Zn-superoxide dismutases lost by amniotes, SOD4. CONCLUSIONS We present a method for targeted identification of genes whose loss/emergence in evolution could be associated with the loss/emergence of a phenotypic trait of interest. In a proof-of-principle study, we identified genes absent in amniotes that participate in body appendage regeneration in anamniotes. Our method provides a wide range of opportunities for studying the relationship between the loss/emergence of phenotypic traits and the loss/emergence of specific genes in evolution.
Collapse
Affiliation(s)
- Vassily A Lyubetsky
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
- Department of Mechanics and Mathematics, Lomonosov Moscow State University, Kolmogorova Str., 1, Moscow, Russia, 119234
| | - Lev I Rubanov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Maria B Tereshina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anastasiya S Ivanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, USA
| | - Karina R Araslanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Leonid A Uroshlev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32, Vavilova Str., Moscow, Russia, 119991
| | - Galina I Goremykina
- Plekhanov Russian University of Economics, Stremyanny Lane 36, Moscow, Russia
| | - Jian-Rong Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Vladimir G Kanovei
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Oleg A Zverkov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Alexander D Shitikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Daria D Korotkova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Global Health Institute, School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Andrey G Zaraisky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997.
- Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
8
|
Vuruputoor VS, Monyak D, Fetter KC, Webster C, Bhattarai A, Shrestha B, Zaman S, Bennett J, McEvoy SL, Caballero M, Wegrzyn JL. Welcome to the big leaves: Best practices for improving genome annotation in non-model plant genomes. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11533. [PMID: 37601314 PMCID: PMC10439824 DOI: 10.1002/aps3.11533] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 02/04/2023] [Accepted: 02/10/2023] [Indexed: 08/22/2023]
Abstract
Premise Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein-coding gene predictions. Methods The impact of repeat masking, long-read and short-read inputs, and de novo and genome-guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. Results Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono-exonic/multi-exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA-read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence-based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome-guided transcriptome assemblies, or full-length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post-processing with functional and structural filters is highly recommended. Discussion While the annotation of non-model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.
Collapse
Affiliation(s)
- Vidya S. Vuruputoor
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Daniel Monyak
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Karl C. Fetter
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Cynthia Webster
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Akriti Bhattarai
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Bikash Shrestha
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Sumaira Zaman
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Jeremy Bennett
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Susan L. McEvoy
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Madison Caballero
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Jill L. Wegrzyn
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| |
Collapse
|
9
|
Hara Y, Kuraku S. The impact of local genomic properties on the evolutionary fate of genes. eLife 2023; 12:82290. [PMID: 37223962 DOI: 10.7554/elife.82290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 04/25/2023] [Indexed: 05/25/2023] Open
Abstract
Functionally indispensable genes are likely to be retained and otherwise to be lost during evolution. This evolutionary fate of a gene can also be affected by factors independent of gene dispensability, including the mutability of genomic positions, but such features have not been examined well. To uncover the genomic features associated with gene loss, we investigated the characteristics of genomic regions where genes have been independently lost in multiple lineages. With a comprehensive scan of gene phylogenies of vertebrates with a careful inspection of evolutionary gene losses, we identified 813 human genes whose orthologs were lost in multiple mammalian lineages: designated 'elusive genes.' These elusive genes were located in genomic regions with rapid nucleotide substitution, high GC content, and high gene density. A comparison of the orthologous regions of such elusive genes across vertebrates revealed that these features had been established before the radiation of the extant vertebrates approximately 500 million years ago. The association of human elusive genes with transcriptomic and epigenomic characteristics illuminated that the genomic regions containing such genes were subject to repressive transcriptional regulation. Thus, the heterogeneous genomic features driving gene fates toward loss have been in place and may sometimes have relaxed the functional indispensability of such genes. This study sheds light on the complex interplay between gene function and local genomic properties in shaping gene evolution that has persisted since the vertebrate ancestor.
Collapse
Affiliation(s)
- Yuichiro Hara
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Shigehiro Kuraku
- Molecular Life History Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan
- Department of Genetics, Sokendai (Graduate University for Advanced Studies), Mishima, Japan
- RIKEN Center for Biosystems Dynamics Research, Kobe, Japan
| |
Collapse
|
10
|
van Rooijen LE, Tromer EC, van Hooff JJE, Kops GJPL, Snel B. Increased Sampling and Intracomplex Homologies Favor Vertical Over Horizontal Inheritance of the Dam1 Complex. Genome Biol Evol 2023; 15:evad017. [PMID: 36790109 PMCID: PMC9998035 DOI: 10.1093/gbe/evad017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/23/2022] [Accepted: 01/21/2023] [Indexed: 02/16/2023] Open
Abstract
Kinetochores connect chromosomes to spindle microtubules to ensure their correct segregation during cell division. Kinetochores of human and yeasts are largely homologous, their ability to track depolymerizing microtubules, however, is carried out by the nonhomologous complexes Ska1-C and Dam1-C, respectively. We previously reported the unique anti-correlating phylogenetic profiles of Dam1-C and Ska-C found among a wide variety of eukaryotes. Based on these profiles and the limited presence of Dam1-C, we speculated that horizontal gene transfer could have played a role in the evolutionary history of Dam1-C. Here, we present an expanded analysis of Dam1-C evolution, using additional genome as well as transcriptome sequences and recently published 3D structures. This analysis revealed a wider and more complete presence of Dam1-C in Cryptista, Rhizaria, Ichthyosporea, CRuMs, and Colponemidia. The fungal Dam1-C cryo-EM structure supports earlier hypothesized intracomplex homologies, which enables the reconstruction of rooted and unrooted phylogenies. The rooted tree of concatenated Dam1-C subunits is statistically consistent with the species tree of eukaryotes, suggesting that Dam1-C is ancient, and that the present-day phylogenetic distribution is best explained by multiple, independent losses and no horizontal gene transfer was involved. Furthermore, we investigated the ancient origin of Dam1-C via profile-versus-profile searches. Homology among 8 out of the 10 Dam1-C subunits suggests that the complex largely evolved from a single multimerizing subunit that diversified into a hetero-octameric core via stepwise subunit duplication and subfunctionalization of the subunits before the origin of the last eukaryotic common ancestor.
Collapse
Affiliation(s)
- Laura E van Rooijen
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Utrecht, The Netherlands
| | - Eelco C Tromer
- Cell Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Faculty of Science and Engineering, University of Groningen, Groningen, The Netherlands
| | - Jolien J E van Hooff
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - Geert J P L Kops
- Oncode Institute, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
11
|
Vosseberg J, Stolker D, von der Dunk SHA, Snel B. Integrating Phylogenetics With Intron Positions Illuminates the Origin of the Complex Spliceosome. Mol Biol Evol 2023; 40:msad011. [PMID: 36631250 PMCID: PMC9887622 DOI: 10.1093/molbev/msad011] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 12/09/2022] [Accepted: 12/29/2022] [Indexed: 01/13/2023] Open
Abstract
Eukaryotic genes are characterized by the presence of introns that are removed from pre-mRNA by a spliceosome. This ribonucleoprotein complex is comprised of multiple RNA molecules and over a hundred proteins, which makes it one of the most complex molecular machines that originated during the prokaryote-to-eukaryote transition. Previous works have established that these introns and the spliceosomal core originated from self-splicing introns in prokaryotes. Yet, how the spliceosomal core expanded by recruiting many additional proteins remains largely elusive. In this study, we use phylogenetic analyses to infer the evolutionary history of 145 proteins that we could trace back to the spliceosome in the last eukaryotic common ancestor. We found that an overabundance of proteins derived from ribosome-related processes was added to the prokaryote-derived core. Extensive duplications of these proteins substantially increased the complexity of the emerging spliceosome. By comparing the intron positions between spliceosomal paralogs, we infer that most spliceosomal complexity postdates the spread of introns through the proto-eukaryotic genome. The reconstruction of early spliceosomal evolution provides insight into the driving forces behind the emergence of complexes with many proteins during eukaryogenesis.
Collapse
Affiliation(s)
- Julian Vosseberg
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, the Netherlands
- Laboratory of Microbiology, Wageningen University & Research, 6700 EH Wageningen, the Netherlands
| | - Daan Stolker
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, the Netherlands
| | - Samuel H A von der Dunk
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, the Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, the Netherlands
| |
Collapse
|
12
|
Kress A, Poch O, Lecompte O, Thompson JD. Real or fake? Measuring the impact of protein annotation errors on estimates of domain gain and loss events. FRONTIERS IN BIOINFORMATICS 2023; 3:1178926. [PMID: 37151482 PMCID: PMC10158824 DOI: 10.3389/fbinf.2023.1178926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Protein annotation errors can have significant consequences in a wide range of fields, ranging from protein structure and function prediction to biomedical research, drug discovery, and biotechnology. By comparing the domains of different proteins, scientists can identify common domains, classify proteins based on their domain architecture, and highlight proteins that have evolved differently in one or more species or clades. However, genome-wide identification of different protein domain architectures involves a complex error-prone pipeline that includes genome sequencing, prediction of gene exon/intron structures, and inference of protein sequences and domain annotations. Here we developed an automated fact-checking approach to distinguish true domain loss/gain events from false events caused by errors that occur during the annotation process. Using genome-wide ortholog sets and taking advantage of the high-quality human and Saccharomyces cerevisiae genome annotations, we analyzed the domain gain and loss events in the predicted proteomes of 9 non-human primates (NHP) and 20 non-S. cerevisiae fungi (NSF) as annotated in the Uniprot and Interpro databases. Our approach allowed us to quantify the impact of errors on estimates of protein domain gains and losses, and we show that domain losses are over-estimated ten-fold and three-fold in the NHP and NSF proteins respectively. This is in line with previous studies of gene-level losses, where issues with genome sequencing or gene annotation led to genes being falsely inferred as absent. In addition, we show that insistent protein domain annotations are a major factor contributing to the false events. For the first time, to our knowledge, we show that domain gains are also over-estimated by three-fold and two-fold respectively in NHP and NSF proteins. Based on our more accurate estimates, we infer that true domain losses and gains in NHP with respect to humans are observed at similar rates, while domain gains in the more divergent NSF are observed twice as frequently as domain losses with respect to S. cerevisiae. This study highlights the need to critically examine the scientific validity of protein annotations, and represents a significant step toward scalable computational fact-checking methods that may 1 day mitigate the propagation of wrong information in protein databases.
Collapse
|
13
|
Barlow LD, Maciejowski W, More K, Terry K, Vargová R, Záhonová K, Dacks JB. Comparative Genomics for Evolutionary Cell Biology Using AMOEBAE: Understanding the Golgi and Beyond. Methods Mol Biol 2022; 2557:431-452. [PMID: 36512230 DOI: 10.1007/978-1-0716-2639-9_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Taking an evolutionary approach to cell biology can yield important new information about how the cell works and how it evolved to do so. This is true of the Golgi apparatus, as it is of all systems within the cell. Comparative genomics is one of the crucial first steps to this line of research, but comes with technical challenges that must be overcome for rigor and robustness. We here introduce AMOEBAE, a workflow for mid-range scale comparative genomic analyses. It allows for customization of parameters, queries, and taxonomic sampling of genomic and transcriptomics data. This protocol article covers the rationale for an evolutionary approach to cell biological study (i.e., when would AMOEBAE be useful), how to use AMOEBAE, and discussion of limitations. It also provides an example dataset, which demonstrates that the Golgi protein AP4 Epsilon is present as the sole retained subunit of the AP4 complex in basidiomycete fungi. AMOEBAE can facilitate comparative genomic studies by balancing reproducibility and speed with user-input and interpretation. It is hoped that AMOEBAE or similar tools will encourage cell biologists to incorporate an evolutionary context into their research.
Collapse
Affiliation(s)
- Lael D Barlow
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada. .,Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee, UK.
| | - William Maciejowski
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Kiran More
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Kara Terry
- Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, AB, Canada
| | - Romana Vargová
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Kristína Záhonová
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec, Czechia.,Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czechia
| | - Joel B Dacks
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada. .,Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, AB, Canada. .,Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec, Czechia. .,Centre for Life's Origin and Evolution, Department of Genetics, Evolution and Environment, University College of London, London, UK.
| |
Collapse
|
14
|
Abstract
Carotenoids constitute an essential dietary component of animals and other non-carotenogenic species which use these pigments in both their modified and unmodified forms. Animals utilize uncleaved carotenoids to mitigate light damage and oxidative stress and to signal fitness and health. Carotenoids also serve as precursors of apocarotenoids including retinol, and its retinoid metabolites, which carry out essential functions in animals by forming the visual chromophore 11-cis-retinaldehyde. Retinoids, such as all-trans-retinoic acid, can also act as ligands of nuclear hormone receptors. The fact that enzymes and biochemical pathways responsible for the metabolism of carotenoids in animals bear resemblance to the ones in plants and other carotenogenic species suggests an evolutionary relationship. We will explore some of the modes of transmission of carotenoid genes from carotenogenic species to metazoans. This apparent relationship has been successfully exploited in the past to identify and characterize new carotenoid and retinoid modifying enzymes. We will review approaches used to identify putative animal carotenoid enzymes, and we will describe methods used to functionally validate and analyze the biochemistry of carotenoid modifying enzymes encoded by animals.
Collapse
Affiliation(s)
- Alexander R Moise
- Northern Ontario School of Medicine, Sudbury, ON, Canada; Department of Chemistry and Biochemistry, Biology and Biomolecular Sciences Program, Laurentian University, Sudbury, ON, Canada.
| | - Sepalika Bandara
- Department of Pharmacology, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| | - Johannes von Lintig
- Department of Pharmacology, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
15
|
Vosseberg J, Schinkel M, Gremmen S, Snel B. The spread of the first introns in proto-eukaryotic paralogs. Commun Biol 2022; 5:476. [PMID: 35589959 PMCID: PMC9120149 DOI: 10.1038/s42003-022-03426-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 04/27/2022] [Indexed: 11/23/2022] Open
Abstract
Spliceosomal introns are a unique feature of eukaryotic genes. Previous studies have established that many introns were present in the protein-coding genes of the last eukaryotic common ancestor (LECA). Intron positions shared between genes that duplicated before LECA could in principle provide insight into the emergence of the first introns. In this study we use ancestral intron position reconstructions in two large sets of duplicated families to systematically identify these ancient paralogous intron positions. We found that 20-35% of introns inferred to have been present in LECA were shared between paralogs. These shared introns, which likely preceded ancient duplications, were wide spread across different functions, with the notable exception of nuclear transport. Since we observed a clear signal of pervasive intron loss prior to LECA, it is likely that substantially more introns were shared at the time of duplication than we can detect in LECA. The large extent of shared introns indicates an early origin of introns during eukaryogenesis and suggests an early origin of a nuclear structure, before most of the other complex eukaryotic features were established.
Collapse
Affiliation(s)
- Julian Vosseberg
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Michelle Schinkel
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands
- Department of Medical Microbiology, Radboud University Medical Center, Radboud Institute for Molecular Life Sciences, Nijmegen, the Netherlands
| | - Sjoerd Gremmen
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands.
| |
Collapse
|
16
|
Repetti SI, Iha C, Uthanumallian K, Jackson CJ, Chen Y, Chan CX, Verbruggen H. Nuclear genome of a pedinophyte pinpoints genomic innovation and streamlining in the green algae. THE NEW PHYTOLOGIST 2022; 233:2144-2154. [PMID: 34923642 DOI: 10.1111/nph.17926] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 12/06/2021] [Indexed: 06/14/2023]
Abstract
The genomic diversity underpinning high ecological and species diversity in the green algae (Chlorophyta) remains little known. Here, we aimed to track genome evolution in the Chlorophyta, focusing on loss and gain of homologous genes, and lineage-specific innovations of the core Chlorophyta. We generated a high-quality nuclear genome for pedinophyte YPF701, a sister lineage to others in the core Chlorophyta and incorporated this genome in a comparative analysis with 25 other genomes from diverse Viridiplantae taxa. The nuclear genome of pedinophyte YPF701 has an intermediate size and gene number between those of most prasinophytes and the remainder of the core Chlorophyta. Our results suggest positive selection for genome streamlining in the Pedinophyceae, independent from genome minimisation observed among prasinophyte lineages. Genome expansion was predicted along the branch leading to the UTC clade (classes Ulvophyceae, Trebouxiophyceae and Chlorophyceae) after divergence from their last common ancestor with pedinophytes, with genomic novelty implicated in a range of basic biological functions. Results emphasise multiple independent signals of genome minimisation within the Chlorophyta, as well as the genomic novelty arising before diversification in the UTC clade, which may underpin the success of this species-rich clade in a diversity of habitats.
Collapse
Affiliation(s)
- Sonja I Repetti
- School of BioSciences, University of Melbourne, Melbourne, Vic, 3010, Australia
| | - Cintia Iha
- School of BioSciences, University of Melbourne, Melbourne, Vic, 3010, Australia
| | | | | | - Yibi Chen
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Qld, 4072, Australia
| | - Cheong Xin Chan
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Qld, 4072, Australia
| | - Heroen Verbruggen
- School of BioSciences, University of Melbourne, Melbourne, Vic, 3010, Australia
| |
Collapse
|
17
|
Juhász A, Lawton SP. Toll like receptors and their evolution in the lymnaeid freshwater snail species Radix auricularia and Lymnaea stagnalis, key intermediate hosts for zoonotic trematodes. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2022; 127:104297. [PMID: 34662684 DOI: 10.1016/j.dci.2021.104297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 10/11/2021] [Accepted: 10/14/2021] [Indexed: 06/13/2023]
Abstract
One of the major evolutionarily conserved pathways in innate immunity of invertebrates is the toll-like receptor (TLR) pathway. However, little is known of the TLR protein family in gastropod molluscs despite their role in the transmission of human diseases, especially the common lymnaeid freshwater snail species Radix auricularia and Lymnaea stagnalis, key intermediate hosts of zoonotic trematodes. Using comparative genomics and gene prediction approaches utilising the freshwater snail Biomphalaria glabrata genome as a reference ten putative TLR proteins were identified in both R. auricularia and L. stagnalis. Phylogenetic analyses revealed that unlike other molluscs the lymnaeid species also possessed class 1 TLRs, previously thought to be unique to B. glabrata. Gene duplication events were also seen across the TLR classes in the lymnaeids with several of the genes appearing to exist as potential tandem elements in R. auricularia. Each predicted TLR was shown to possess the typical the leucine-rich repeat extracellular and TIR intracellular domains and both single cysteine clusters and multiple cysteine clusters TLRs were identified in both lymnaeid species. Principle component analyses of 3D models of the predicted TLRs showed that class 1 and 5 proteins did not cluster based on similarity of structure, suggested to be potential adaptation to a range of pathogens. This study provides the first detailed account of TLRs in lymnaeids and affords a platform for further research into the role of these proteins into susceptibility and compatibility of these snails with trematodes and their role in transmission.
Collapse
Affiliation(s)
- Alexandra Juhász
- Institute of Medical Microbiology, Semmelweis University, H-1089, Budapest, Hungary; Department of Tropical Disease Biology, Liverpool School of Tropical Medicine, Liverpool, L3 5QA, UK
| | - Scott P Lawton
- Epidemiology Research Unit (ERU) Department of Veterinary and Animal Sciences, Northern Faculty, Scotland's Rural College (SRUC), An Lòchran, 10 Inverness Campus, Inverness, IV2 5NA, UK.
| |
Collapse
|
18
|
Fang Y, Li M, Li X, Yang Y. GFICLEE: ultrafast tree-based phylogenetic profile method inferring gene function at the genomic-wide level. BMC Genomics 2021; 22:774. [PMID: 34715785 PMCID: PMC8557005 DOI: 10.1186/s12864-021-08070-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 10/10/2021] [Indexed: 11/25/2022] Open
Abstract
Background Phylogenetic profiling is widely used to predict novel members of large protein complexes and biological pathways. Although methods combined with phylogenetic trees have significantly improved prediction accuracy, computational efficiency is still an issue that limits its genome-wise application. Results Here we introduce a new tree-based phylogenetic profiling algorithm named GFICLEE, which infers common single and continuous loss (SCL) events in the evolutionary patterns. We validated our algorithm with human pathways from three databases and compared the computational efficiency with current tree-based with 10 different scales genome dataset. Our algorithm has a better predictive performance with high computational efficiency. Conclusions The GFICLEE is a new method to infers genome-wide gene function. The accuracy and computational efficiency of GFICLEE make it possible to explore gene functions at the genome-wide level on a personal computer. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08070-7.
Collapse
Affiliation(s)
- Yang Fang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, People's Republic of China
| | - Xufeng Li
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
| | - Yi Yang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China.
| |
Collapse
|
19
|
|
20
|
Tromer EC, Wemyss TA, Ludzia P, Waller RF, Akiyoshi B. Repurposing of synaptonemal complex proteins for kinetochores in Kinetoplastida. Open Biol 2021; 11:210049. [PMID: 34006126 PMCID: PMC8131943 DOI: 10.1098/rsob.210049] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 04/16/2021] [Indexed: 12/25/2022] Open
Abstract
Chromosome segregation in eukaryotes is driven by the kinetochore, a macromolecular complex that connects centromeric DNA to microtubules of the spindle apparatus. Kinetochores in well-studied model eukaryotes consist of a core set of proteins that are broadly conserved among distant eukaryotic phyla. By contrast, unicellular flagellates of the class Kinetoplastida have a unique set of 36 kinetochore components. The evolutionary origin and history of these kinetochores remain unknown. Here, we report evidence of homology between axial element components of the synaptonemal complex and three kinetoplastid kinetochore proteins KKT16-18. The synaptonemal complex is a zipper-like structure that assembles between homologous chromosomes during meiosis to promote recombination. By using sensitive homology detection protocols, we identify divergent orthologues of KKT16-18 in most eukaryotic supergroups, including experimentally established chromosomal axis components, such as Red1 and Rec10 in budding and fission yeast, ASY3-4 in plants and SYCP2-3 in vertebrates. Furthermore, we found 12 recurrent duplications within this ancient eukaryotic SYCP2-3 gene family, providing opportunities for new functional complexes to arise, including KKT16-18 in the kinetoplastid parasite Trypanosoma brucei. We propose the kinetoplastid kinetochore system evolved by repurposing meiotic components of the chromosome synapsis and homologous recombination machinery that were already present in early eukaryotes.
Collapse
Affiliation(s)
- Eelco C. Tromer
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Cell Biochemistry, Groningen Institute of Biomolecular Sciences & Biotechnology, University of Groningen, Groningen, The Netherlands
| | - Thomas A. Wemyss
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Patryk Ludzia
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Ross F. Waller
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Bungo Akiyoshi
- Department of Biochemistry, University of Oxford, Oxford, UK
| |
Collapse
|
21
|
Zajac N, Zoller S, Seppälä K, Moi D, Dessimoz C, Jokela J, Hartikainen H, Glover N. Gene Duplication and Gain in the Trematode Atriophallophorus winterbourni Contributes to Adaptation to Parasitism. Genome Biol Evol 2021; 13:evab010. [PMID: 33484570 PMCID: PMC7936022 DOI: 10.1093/gbe/evab010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2021] [Indexed: 01/10/2023] Open
Abstract
Gene duplications and novel genes have been shown to play a major role in helminth adaptation to a parasitic lifestyle because they provide the novelty necessary for adaptation to a changing environment, such as living in multiple hosts. Here we present the de novo sequenced and annotated genome of the parasitic trematode Atriophallophorus winterbourni and its comparative genomic analysis to other major parasitic trematodes. First, we reconstructed the species phylogeny, and dated the split of A. winterbourni from the Opisthorchiata suborder to approximately 237.4 Ma (±120.4 Myr). We then addressed the question of which expanded gene families and gained genes are potentially involved in adaptation to parasitism. To do this, we used hierarchical orthologous groups to reconstruct three ancestral genomes on the phylogeny leading to A. winterbourni and performed a GO (Gene Ontology) enrichment analysis of the gene composition of each ancestral genome, allowing us to characterize the subsequent genomic changes. Out of the 11,499 genes in the A. winterbourni genome, as much as 24% have arisen through duplication events since the speciation of A. winterbourni from the Opisthorchiata, and as much as 31.9% appear to be novel, that is, newly acquired. We found 13 gene families in A. winterbourni to have had more than ten genes arising through these recent duplications; all of which have functions potentially relating to host behavioral manipulation, host tissue penetration, and hiding from host immunity through antigen presentation. We identified several families with genes evolving under positive selection. Our results provide a valuable resource for future studies on the genomic basis of adaptation to parasitism and point to specific candidate genes putatively involved in antagonistic host-parasite adaptation.
Collapse
Affiliation(s)
- Natalia Zajac
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
| | - Stefan Zoller
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
| | - Katri Seppälä
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- Research Department for Limnology, University of Innsbruck, Mondsee, Austria
| | - David Moi
- Department of Computational Biology, University of Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Center for Integrative Genomics, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Center for Integrative Genomics, Lausanne, Switzerland
- Centre for Life’s Origins and Evolution, Department of Genetics Evolution and Environment, University College London, United Kingdom
- Department of Computer Science, University College London, United Kingdom
| | - Jukka Jokela
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
| | - Hanna Hartikainen
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
- School of Life Sciences, University of Nottingham, University Park, United Kingdom
| | - Natasha Glover
- Department of Computational Biology, University of Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Center for Integrative Genomics, Lausanne, Switzerland
| |
Collapse
|
22
|
Meyer C, Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes. BMC Bioinformatics 2020; 21:513. [PMID: 33172385 PMCID: PMC7656754 DOI: 10.1186/s12859-020-03855-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/30/2020] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon–intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. Results We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. Conclusions Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon–intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction.
Collapse
Affiliation(s)
- Corentin Meyer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Nicolas Scalzitti
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Anne Jeannin-Girardon
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
23
|
Feng S, Stiller J, Deng Y, Armstrong J, Fang Q, Reeve AH, Xie D, Chen G, Guo C, Faircloth BC, Petersen B, Wang Z, Zhou Q, Diekhans M, Chen W, Andreu-Sánchez S, Margaryan A, Howard JT, Parent C, Pacheco G, Sinding MHS, Puetz L, Cavill E, Ribeiro ÂM, Eckhart L, Fjeldså J, Hosner PA, Brumfield RT, Christidis L, Bertelsen MF, Sicheritz-Ponten T, Tietze DT, Robertson BC, Song G, Borgia G, Claramunt S, Lovette IJ, Cowen SJ, Njoroge P, Dumbacher JP, Ryder OA, Fuchs J, Bunce M, Burt DW, Cracraft J, Meng G, Hackett SJ, Ryan PG, Jønsson KA, Jamieson IG, da Fonseca RR, Braun EL, Houde P, Mirarab S, Suh A, Hansson B, Ponnikas S, Sigeman H, Stervander M, Frandsen PB, van der Zwan H, van der Sluis R, Visser C, Balakrishnan CN, Clark AG, Fitzpatrick JW, Bowman R, Chen N, Cloutier A, Sackton TB, Edwards SV, Foote DJ, Shakya SB, Sheldon FH, Vignal A, Soares AER, Shapiro B, González-Solís J, Ferrer-Obiol J, Rozas J, Riutort M, Tigano A, Friesen V, Dalén L, Urrutia AO, Székely T, Liu Y, Campana MG, Corvelo A, Fleischer RC, Rutherford KM, Gemmell NJ, Dussex N, Mouritsen H, Thiele N, Delmore K, Liedvogel M, Franke A, Hoeppner MP, Krone O, et alFeng S, Stiller J, Deng Y, Armstrong J, Fang Q, Reeve AH, Xie D, Chen G, Guo C, Faircloth BC, Petersen B, Wang Z, Zhou Q, Diekhans M, Chen W, Andreu-Sánchez S, Margaryan A, Howard JT, Parent C, Pacheco G, Sinding MHS, Puetz L, Cavill E, Ribeiro ÂM, Eckhart L, Fjeldså J, Hosner PA, Brumfield RT, Christidis L, Bertelsen MF, Sicheritz-Ponten T, Tietze DT, Robertson BC, Song G, Borgia G, Claramunt S, Lovette IJ, Cowen SJ, Njoroge P, Dumbacher JP, Ryder OA, Fuchs J, Bunce M, Burt DW, Cracraft J, Meng G, Hackett SJ, Ryan PG, Jønsson KA, Jamieson IG, da Fonseca RR, Braun EL, Houde P, Mirarab S, Suh A, Hansson B, Ponnikas S, Sigeman H, Stervander M, Frandsen PB, van der Zwan H, van der Sluis R, Visser C, Balakrishnan CN, Clark AG, Fitzpatrick JW, Bowman R, Chen N, Cloutier A, Sackton TB, Edwards SV, Foote DJ, Shakya SB, Sheldon FH, Vignal A, Soares AER, Shapiro B, González-Solís J, Ferrer-Obiol J, Rozas J, Riutort M, Tigano A, Friesen V, Dalén L, Urrutia AO, Székely T, Liu Y, Campana MG, Corvelo A, Fleischer RC, Rutherford KM, Gemmell NJ, Dussex N, Mouritsen H, Thiele N, Delmore K, Liedvogel M, Franke A, Hoeppner MP, Krone O, Fudickar AM, Milá B, Ketterson ED, Fidler AE, Friis G, Parody-Merino ÁM, Battley PF, Cox MP, Lima NCB, Prosdocimi F, Parchman TL, Schlinger BA, Loiselle BA, Blake JG, Lim HC, Day LB, Fuxjager MJ, Baldwin MW, Braun MJ, Wirthlin M, Dikow RB, Ryder TB, Camenisch G, Keller LF, DaCosta JM, Hauber ME, Louder MIM, Witt CC, McGuire JA, Mudge J, Megna LC, Carling MD, Wang B, Taylor SA, Del-Rio G, Aleixo A, Vasconcelos ATR, Mello CV, Weir JT, Haussler D, Li Q, Yang H, Wang J, Lei F, Rahbek C, Gilbert MTP, Graves GR, Jarvis ED, Paten B, Zhang G. Dense sampling of bird diversity increases power of comparative genomics. Nature 2020; 587:252-257. [PMID: 33177665 PMCID: PMC7759463 DOI: 10.1038/s41586-020-2873-9] [Show More Authors] [Citation(s) in RCA: 234] [Impact Index Per Article: 46.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 07/27/2020] [Indexed: 12/13/2022]
Abstract
Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1-4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families-including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species.
Collapse
Affiliation(s)
- Shaohong Feng
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- BGI-Shenzhen, Shenzhen, China
| | - Josefin Stiller
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yuan Deng
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qi Fang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Andrew Hart Reeve
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Duo Xie
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Guangji Chen
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Chunxue Guo
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
| | - Brant C Faircloth
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Bent Petersen
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Zongji Wang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- MOE Laboratory of Biosystems Homeostasis and Protection, Life Sciences Institute, Zhejiang University, Hangzhou, China
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria
| | - Qi Zhou
- MOE Laboratory of Biosystems Homeostasis and Protection, Life Sciences Institute, Zhejiang University, Hangzhou, China
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Wanjun Chen
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
| | - Sergio Andreu-Sánchez
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Ashot Margaryan
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Institute of Molecular Biology, National Academy of Sciences, Yerevan, Armenia
| | | | | | - George Pacheco
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mikkel-Holger S Sinding
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Lara Puetz
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Emily Cavill
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Ângela M Ribeiro
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Leopold Eckhart
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Jon Fjeldså
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Peter A Hosner
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Robb T Brumfield
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Les Christidis
- Southern Cross University, Coffs Harbour, New South Wales, Australia
| | - Mads F Bertelsen
- Centre for Zoo and Wild Animal Health, Copenhagen Zoo, Frederiksberg, Denmark
| | - Thomas Sicheritz-Ponten
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Gang Song
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Environmental Futures Research Institute, Griffith University, Nathan, Queensland, Australia
| | - Gerald Borgia
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Santiago Claramunt
- Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Irby J Lovette
- Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
| | - Saul J Cowen
- Biodiversity and Conservation Science, Department of Biodiversity Conservation and Attractions, Perth, Western Australia, Australia
| | - Peter Njoroge
- Ornithology Section, Zoology Department, National Museums of Kenya, Nairobi, Kenya
| | | | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
- Evolution, Behavior, and Ecology, Division of Biology, University of California San Diego, La Jolla, CA, USA
| | - Jérôme Fuchs
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Michael Bunce
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Western Australia, Perth, Australia
| | - David W Burt
- UQ Genomics, University of Queensland, Brisbane, Queensland, Australia
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY, USA
| | | | - Shannon J Hackett
- Integrative Research Center, Field Museum of Natural History, Chicago, IL, USA
| | - Peter G Ryan
- FitzPatrick Institute of African Ornithology, University of Cape Town, Cape Town, South Africa
| | - Knud Andreas Jønsson
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Ian G Jamieson
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Rute R da Fonseca
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL, USA
| | - Peter Houde
- Department of Biology, New Mexico State University, Las Cruces, NM, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Alexander Suh
- Department of Ecology and Genetics - Evolutionary Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Bengt Hansson
- Department of Biology, Lund University, Lund, Sweden
| | - Suvi Ponnikas
- Department of Biology, Lund University, Lund, Sweden
| | - Hanna Sigeman
- Department of Biology, Lund University, Lund, Sweden
| | - Martin Stervander
- Department of Biology, Lund University, Lund, Sweden
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Paul B Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, USA
| | | | - Rencia van der Sluis
- Focus Area for Human Metabolomics, North-West University, Potchefstroom, South Africa
| | - Carina Visser
- Department of Animal Sciences, University of Pretoria, Pretoria, South Africa
| | | | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | | | - Reed Bowman
- Avian Ecology Program, Archbold Biological Station, Venus, FL, USA
| | - Nancy Chen
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | | | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Dustin J Foote
- Department of Biology, East Carolina University, Greenville, NC, USA
- Sylvan Heights Bird Park, Scotland Neck, NC, USA
| | - Subir B Shakya
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Frederick H Sheldon
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Alain Vignal
- GenPhySE, INRA, INPT, INP-ENVT, Université de Toulouse, Castanet-Tolosan, France
| | - André E R Soares
- Laboratório Nacional de Computação Científica, Petrópolis, Brazil
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jacob González-Solís
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals (BEECA), Universitat de Barcelona, Barcelona, Spain
| | - Joan Ferrer-Obiol
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
| | - Marta Riutort
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
| | - Anna Tigano
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA
- Department of Biology, Queen's University, Kingston, Ontario, Canada
| | - Vicki Friesen
- Department of Biology, Queen's University, Kingston, Ontario, Canada
| | - Love Dalén
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
| | - Araxi O Urrutia
- Milner Centre for Evolution, University of Bath, Bath, UK
- Instituto de Ecologia, UNAM, Mexico City, Mexico
| | - Tamás Székely
- Milner Centre for Evolution, University of Bath, Bath, UK
| | - Yang Liu
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Guangzhou, China
| | - Michael G Campana
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian Institution, Washington, DC, USA
| | | | - Robert C Fleischer
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian Institution, Washington, DC, USA
| | - Kim M Rutherford
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Nicolas Dussex
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Henrik Mouritsen
- AG Neurosensory Sciences, Institut für Biologie und Umweltwissenschaften, University of Oldenburg, Oldenburg, Germany
| | - Nadine Thiele
- AG Neurosensory Sciences, Institut für Biologie und Umweltwissenschaften, University of Oldenburg, Oldenburg, Germany
| | - Kira Delmore
- Biology Department, Texas A&M University, College Station, TX, USA
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Miriam Liedvogel
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Marc P Hoeppner
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Oliver Krone
- Department of Wildlife Diseases, Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Adam M Fudickar
- Environmental Resilience Institute, Indiana University, Bloomington, IN, USA
| | - Borja Milá
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
| | | | - Andrew Eric Fidler
- Institute of Marine Science, University of Auckland, Auckland, New Zealand
| | - Guillermo Friis
- Center for Genomics and Systems Biology, Department of Biology, New York University - Abu Dhabi, Abu Dhabi, UAE
| | | | - Phil F Battley
- Wildlife and Ecology Group, Massey University, Palmerston North, New Zealand
| | - Murray P Cox
- School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Nicholas Costa Barroso Lima
- Laboratório Nacional de Computação Científica, Petrópolis, Brazil
- Departamento de Bioquímica e Biologia Molecular, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, Brazil
| | - Francisco Prosdocimi
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Rio de Janeiro, Brazil
| | | | - Barney A Schlinger
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA, USA
- Smithsonian Tropical Research Institute, Panama City, Panama
| | - Bette A Loiselle
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
- Center for Latin American Studies, University of Florida, Gainesville, FL, USA
| | - John G Blake
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | - Haw Chuan Lim
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian Institution, Washington, DC, USA
- Department of Biology, George Mason University, Fairfax, VA, USA
| | - Lainy B Day
- Department of Biology and Neuroscience Minor, University of Mississippi, University, MS, USA
| | - Matthew J Fuxjager
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, USA
| | | | - Michael J Braun
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Behavior, Ecology, Evolution and Systematics Program, University of Maryland, College Park, MD, USA
| | - Morgan Wirthlin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Rebecca B Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, USA
| | - T Brandt Ryder
- Migratory Bird Center, Smithsonian National Zoological Park and Conservation Biology Institute, Washington, DC, USA
| | - Glauco Camenisch
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Lukas F Keller
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | - Mark E Hauber
- Department of Evolution, Ecology, and Behavior, School of Integrative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Matthew I M Louder
- Department of Biology, East Carolina University, Greenville, NC, USA
- Department of Evolution, Ecology, and Behavior, School of Integrative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- International Research Center for Neurointelligence, University of Tokyo, Tokyo, Japan
| | - Christopher C Witt
- Museum of Southwestern Biology, Department of Biology, University of New Mexico, Albuquerque, NM, USA
| | - Jimmy A McGuire
- Museum of Vertebrate Zoology, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Joann Mudge
- National Center for Genome Resources, Santa Fe, NM, USA
| | - Libby C Megna
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Matthew D Carling
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Biao Wang
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Scott A Taylor
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Glaucia Del-Rio
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Alexandre Aleixo
- Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland
| | | | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Jason T Weir
- Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
| | - David Haussler
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qiye Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, China
- James D. Watson Institute of Genome Sciences, Hangzhou, China
| | | | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Carsten Rahbek
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark
- Institute of Ecology, Peking University, Beijing, China
- Department of Life Sciences, Imperial College London, Ascot, UK
| | - M Thomas P Gilbert
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- University Museum, Norwegian University of Science and Technology, Trondheim, Norway
| | - Gary R Graves
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Erich D Jarvis
- Duke University Medical Center, Durham, NC, USA
- The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA.
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
24
|
Vosseberg J, van Hooff JJE, Marcet-Houben M, van Vlimmeren A, van Wijk LM, Gabaldón T, Snel B. Timing the origin of eukaryotic cellular complexity with ancient duplications. Nat Ecol Evol 2020; 5:92-100. [PMID: 33106602 PMCID: PMC7610411 DOI: 10.1038/s41559-020-01320-z] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 08/28/2020] [Indexed: 11/29/2022]
Abstract
Eukaryogenesis is one of the most enigmatic evolutionary transitions, during which simple prokaryotic cells gave rise to complex eukaryotic cells. While evolutionary intermediates are lacking, gene duplications provide information on the order of events by which eukaryotes originated. Here we use a phylogenomics approach to reconstruct successive steps during eukaryogenesis. We found that gene duplications roughly doubled the proto-eukaryotic gene repertoire, with families inherited from the Asgard archaea-related host being duplicated most. By relatively timing events using phylogenetic distances we inferred that duplications in cytoskeletal and membrane trafficking families were among the earliest events, whereas most other families expanded predominantly after mitochondrial endosymbiosis. Altogether, we infer that the host that engulfed the proto-mitochondrion had some eukaryote-like complexity, which drastically increased upon mitochondrial acquisition. This scenario bridges the signs of complexity observed in Asgard archaeal genomes to the proposed role of mitochondria in triggering eukaryogenesis.
Collapse
Affiliation(s)
- Julian Vosseberg
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Jolien J E van Hooff
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands.,Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Orsay, France
| | - Marina Marcet-Houben
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain.,Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain.,Mechanisms of Disease, Institute for Research in Biomedicine, Barcelona, Spain
| | - Anne van Vlimmeren
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands.,Department of Biological Sciences, Columbia University, New York City, NY, USA
| | - Leny M van Wijk
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Toni Gabaldón
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain. .,Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain. .,Mechanisms of Disease, Institute for Research in Biomedicine, Barcelona, Spain. .,Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands.
| |
Collapse
|
25
|
Deutekom ES, Snel B, van Dam TJP. Benchmarking orthology methods using phylogenetic patterns defined at the base of Eukaryotes. Brief Bioinform 2020; 22:5906198. [PMID: 32935832 PMCID: PMC8138875 DOI: 10.1093/bib/bbaa206] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 12/26/2022] Open
Abstract
Insights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets or expanding the analyses to larger scales. Automated orthology inference should allow such large-scale analyses, but substantial differences between orthologous groups generated by different approaches are observed. We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regard to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the last eukaryotic common ancestor’s gene content, pervasiveness of gene loss and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other. We find that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time, derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large-scale evaluation, the obtained orthologous groups differ vastly from one another. Availability and implementation The data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.
Collapse
Affiliation(s)
| | - Berend Snel
- Corresponding author: Berend Snel, Padualaan 8, 358CH Utrecht, The Netherlands. Tel.: +31(0)30 253 8102; E-mail:
| | | |
Collapse
|
26
|
Baggs EL, Monroe JG, Thanki AS, O'Grady R, Schudoma C, Haerty W, Krasileva KV. Convergent Loss of an EDS1/PAD4 Signaling Pathway in Several Plant Lineages Reveals Coevolved Components of Plant Immunity and Drought Response. THE PLANT CELL 2020; 32:2158-2177. [PMID: 32409319 PMCID: PMC7346574 DOI: 10.1105/tpc.19.00903] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 04/28/2020] [Accepted: 05/12/2020] [Indexed: 05/19/2023]
Abstract
Plant innate immunity relies on nucleotide binding leucine-rich repeat receptors (NLRs) that recognize pathogen-derived molecules and activate downstream signaling pathways. We analyzed the variation in NLR gene copy number and identified plants with a low number of NLR genes relative to sister species. We specifically focused on four plants from two distinct lineages, one monocot lineage (Alismatales) and one eudicot lineage (Lentibulariaceae). In these lineages, the loss of NLR genes coincides with loss of the well-known downstream immune signaling complex ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1)/PHYTOALEXIN DEFICIENT 4 (PAD4). We expanded our analysis across whole proteomes and found that other characterized immune genes were absent only in Lentibulariaceae and Alismatales. Additionally, we identified genes of unknown function that were convergently lost together with EDS1/PAD4 in five plant species. Gene expression analyses in Arabidopsis (Arabidopsis thaliana) and Oryza sativa revealed that several homologs of the candidates are differentially expressed during pathogen infection, drought, and abscisic acid treatment. Our analysis provides evolutionary evidence for the rewiring of plant immunity in some plant lineages, as well as the coevolution of the EDS1/PAD4 pathway and drought responses.
Collapse
Affiliation(s)
- Erin L Baggs
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
- University of California Berkeley, Berkeley, California 94720
| | - J Grey Monroe
- University of California Davis, Davis, California 95616
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Anil S Thanki
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
| | - Ruby O'Grady
- The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH, United Kingdom
| | - Christian Schudoma
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
| | - Wilfried Haerty
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
| | - Ksenia V Krasileva
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
- University of California Berkeley, Berkeley, California 94720
- The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH, United Kingdom
| |
Collapse
|
27
|
de Paula Freitas FC, Lourenço AP, Nunes FMF, Paschoal AR, Abreu FCP, Barbin FO, Bataglia L, Cardoso-Júnior CAM, Cervoni MS, Silva SR, Dalarmi F, Del Lama MA, Depintor TS, Ferreira KM, Gória PS, Jaskot MC, Lago DC, Luna-Lucena D, Moda LM, Nascimento L, Pedrino M, Oliveira FR, Sanches FC, Santos DE, Santos CG, Vieira J, Barchuk AR, Hartfelder K, Simões ZLP, Bitondi MMG, Pinheiro DG. The nuclear and mitochondrial genomes of Frieseomelitta varia - a highly eusocial stingless bee (Meliponini) with a permanently sterile worker caste. BMC Genomics 2020; 21:386. [PMID: 32493270 PMCID: PMC7268684 DOI: 10.1186/s12864-020-06784-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 05/14/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Most of our understanding on the social behavior and genomics of bees and other social insects is centered on the Western honey bee, Apis mellifera. The genus Apis, however, is a highly derived branch comprising less than a dozen species, four of which genomically characterized. In contrast, for the equally highly eusocial, yet taxonomically and biologically more diverse Meliponini, a full genome sequence was so far available for a single Melipona species only. We present here the genome sequence of Frieseomelitta varia, a stingless bee that has, as a peculiarity, a completely sterile worker caste. RESULTS The assembly of 243,974,526 high quality Illumina reads resulted in a predicted assembled genome size of 275 Mb composed of 2173 scaffolds. A BUSCO analysis for the 10,526 predicted genes showed that these represent 96.6% of the expected hymenopteran orthologs. We also predicted 169,371 repetitive genomic components, 2083 putative transposable elements, and 1946 genes for non-coding RNAs, largely long non-coding RNAs. The mitochondrial genome comprises 15,144 bp, encoding 13 proteins, 22 tRNAs and 2 rRNAs. We observed considerable rearrangement in the mitochondrial gene order compared to other bees. For an in-depth analysis of genes related to social biology, we manually checked the annotations for 533 automatically predicted gene models, including 127 genes related to reproductive processes, 104 to development, and 174 immunity-related genes. We also performed specific searches for genes containing transcription factor domains and genes related to neurogenesis and chemosensory communication. CONCLUSIONS The total genome size for F. varia is similar to the sequenced genomes of other bees. Using specific prediction methods, we identified a large number of repetitive genome components and long non-coding RNAs, which could provide the molecular basis for gene regulatory plasticity, including worker reproduction. The remarkable reshuffling in gene order in the mitochondrial genome suggests that stingless bees may be a hotspot for mtDNA evolution. Hence, while being just the second stingless bee genome sequenced, we expect that subsequent targeting of a selected set of species from this diverse clade of highly eusocial bees will reveal relevant evolutionary signals and trends related to eusociality in these important pollinators.
Collapse
Affiliation(s)
- Flávia C. de Paula Freitas
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
- Departamento de Biologia Celular e do Desenvolvimento, Instituto de Ciências Biomédicas, Universidade Federal de Alfenas, Alfenas, MG Brazil
| | - Anete P. Lourenço
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
- Departamento de Ciências Biológicas, Faculdade de Ciências Biológicas e da Saúde, Universidade Federal dos Vales do Jequitinhonha e Mucuri, Diamantina, MG Brazil
| | - Francis M. F. Nunes
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | | | - Fabiano C. P. Abreu
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Fábio O. Barbin
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Luana Bataglia
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Carlos A. M. Cardoso-Júnior
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Mário S. Cervoni
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Saura R. Silva
- Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista “Júlio de Mesquita Filho”, Jaboticabal, SP Brazil
| | - Fernanda Dalarmi
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Marco A. Del Lama
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | - Thiago S. Depintor
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Kátia M. Ferreira
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | - Paula S. Gória
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | - Michael C. Jaskot
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | - Denyse C. Lago
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Danielle Luna-Lucena
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Livia M. Moda
- Departamento de Biologia Celular e do Desenvolvimento, Instituto de Ciências Biomédicas, Universidade Federal de Alfenas, Alfenas, MG Brazil
| | - Leonardo Nascimento
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Matheus Pedrino
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | - Franciene Rabiço Oliveira
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Fernanda C. Sanches
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, SP Brazil
| | - Douglas E. Santos
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Carolina G. Santos
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Joseana Vieira
- Departamento de Biologia Celular e do Desenvolvimento, Instituto de Ciências Biomédicas, Universidade Federal de Alfenas, Alfenas, MG Brazil
| | - Angel R. Barchuk
- Departamento de Biologia Celular e do Desenvolvimento, Instituto de Ciências Biomédicas, Universidade Federal de Alfenas, Alfenas, MG Brazil
| | - Klaus Hartfelder
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Zilá L. P. Simões
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Márcia M. G. Bitondi
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP Brazil
| | - Daniel G. Pinheiro
- Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista “Júlio de Mesquita Filho”, Jaboticabal, SP Brazil
| |
Collapse
|
28
|
Greshake Tzovaras B, Segers FHID, Bicker A, Dal Grande F, Otte J, Anvar SY, Hankeln T, Schmitt I, Ebersberger I. What Is in Umbilicaria pustulata? A Metagenomic Approach to Reconstruct the Holo-Genome of a Lichen. Genome Biol Evol 2020; 12:309-324. [PMID: 32163141 PMCID: PMC7186782 DOI: 10.1093/gbe/evaa049] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2020] [Indexed: 12/29/2022] Open
Abstract
Lichens are valuable models in symbiosis research and promising sources of biosynthetic genes for biotechnological applications. Most lichenized fungi grow slowly, resist aposymbiotic cultivation, and are poor candidates for experimentation. Obtaining contiguous, high-quality genomes for such symbiotic communities is technically challenging. Here, we present the first assembly of a lichen holo-genome from metagenomic whole-genome shotgun data comprising both PacBio long reads and Illumina short reads. The nuclear genomes of the two primary components of the lichen symbiosis-the fungus Umbilicaria pustulata (33 Mb) and the green alga Trebouxia sp. (53 Mb)-were assembled at contiguities comparable to single-species assemblies. The analysis of the read coverage pattern revealed a relative abundance of fungal to algal nuclei of ∼20:1. Gap-free, circular sequences for all organellar genomes were obtained. The bacterial community is dominated by Acidobacteriaceae and encompasses strains closely related to bacteria isolated from other lichens. Gene set analyses showed no evidence of horizontal gene transfer from algae or bacteria into the fungal genome. Our data suggest a lineage-specific loss of a putative gibberellin-20-oxidase in the fungus, a gene fusion in the fungal mitochondrion, and a relocation of an algal chloroplast gene to the algal nucleus. Major technical obstacles during reconstruction of the holo-genome were coverage differences among individual genomes surpassing three orders of magnitude. Moreover, we show that GC-rich inverted repeats paired with nonrandom sequencing error in PacBio data can result in missing gene predictions. This likely poses a general problem for genome assemblies based on long reads.
Collapse
Affiliation(s)
- Bastian Greshake Tzovaras
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- Lawrence Berkeley National Laboratory, Berkeley, California
- Center for Research & Interdisciplinarity, Université de Paris, France
| | - Francisca H I D Segers
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
| | - Anne Bicker
- Institute for Organismic and Molecular Evolution, Molecular Genetics and Genome Analysis, Johannes Gutenberg University Mainz, Germany
| | - Francesco Dal Grande
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Jürgen Otte
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Seyed Yahya Anvar
- Department of Human Genetics, Leiden University Medical Center, The Netherlands
| | - Thomas Hankeln
- Institute for Organismic and Molecular Evolution, Molecular Genetics and Genome Analysis, Johannes Gutenberg University Mainz, Germany
| | - Imke Schmitt
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
- Molecular Evolutionary Biology Group, Institute of Ecology, Diversity, and Evolution, Goethe University Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| |
Collapse
|
29
|
Brueckner J, Martin WF. Bacterial Genes Outnumber Archaeal Genes in Eukaryotic Genomes. Genome Biol Evol 2020; 12:282-292. [PMID: 32142116 PMCID: PMC7151554 DOI: 10.1093/gbe/evaa047] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2020] [Indexed: 12/13/2022] Open
Abstract
Eukaryotes are typically depicted as descendants of archaea, but their genomes are evolutionary chimeras with genes stemming from archaea and bacteria. Which prokaryotic heritage predominates? Here, we have clustered 19,050,992 protein sequences from 5,443 bacteria and 212 archaea with 3,420,731 protein sequences from 150 eukaryotes spanning six eukaryotic supergroups. By downsampling, we obtain estimates for the bacterial and archaeal proportions. Eukaryotic genomes possess a bacterial majority of genes. On average, the majority of bacterial genes is 56% overall, 53% in eukaryotes that never possessed plastids, and 61% in photosynthetic eukaryotic lineages, where the cyanobacterial ancestor of plastids contributed additional genes to the eukaryotic lineage. Intracellular parasites, which undergo reductive evolution in adaptation to the nutrient rich environment of the cells that they infect, relinquish bacterial genes for metabolic processes. Such adaptive gene loss is most pronounced in the human parasite Encephalitozoon intestinalis with 86% archaeal and 14% bacterial derived genes. The most bacterial eukaryote genome sampled is rice, with 67% bacterial and 33% archaeal genes. The functional dichotomy, initially described for yeast, of archaeal genes being involved in genetic information processing and bacterial genes being involved in metabolic processes is conserved across all eukaryotic supergroups.
Collapse
Affiliation(s)
- Julia Brueckner
- Institute for Molecular Evolution, Heinrich Heine University Düsseldorf, Germany
| | - William F Martin
- Institute for Molecular Evolution, Heinrich Heine University Düsseldorf, Germany
| |
Collapse
|
30
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|