1
|
Mikhailova AA, Dohmen E, Harrison MC. Major changes in domain arrangements are associated with the evolution of termites. J Evol Biol 2024; 37:758-769. [PMID: 38630634 DOI: 10.1093/jeb/voae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/18/2023] [Accepted: 04/12/2024] [Indexed: 04/19/2024]
Abstract
Domains as functional protein units and their rearrangements along the phylogeny can shed light on the functional changes of proteomes associated with the evolution of complex traits like eusociality. This complex trait is associated with sterile soldiers and workers, and long-lived, highly fecund reproductives. Unlike in Hymenoptera (ants, bees, and wasps), the evolution of eusociality within Blattodea, where termites evolved from within cockroaches, was accompanied by a reduction in proteome size, raising the question of whether functional novelty was achieved with existing rather than novel proteins. To address this, we investigated the role of domain rearrangements during the evolution of termite eusociality. Analysing domain rearrangements in the proteomes of three solitary cockroaches and five eusocial termites, we inferred more than 5,000 rearrangements over the phylogeny of Blattodea. The 90 novel domain arrangements that emerged at the origin of termites were enriched for several functions related to longevity, such as protein homeostasis, DNA repair, mitochondrial activity, and nutrient sensing. Many domain rearrangements were related to changes in developmental pathways, important for the emergence of novel castes. Along with the elaboration of social complexity, including permanently sterile workers and larger, foraging colonies, we found 110 further domain arrangements with functions related to protein glycosylation and ion transport. We found an enrichment of caste-biased expression and splicing within rearranged genes, highlighting their importance for the evolution of castes. Furthermore, we found increased levels of DNA methylation among rearranged compared to non-rearranged genes suggesting fundamental differences in their regulation. Our findings indicate the importance of domain rearrangements in the generation of functional novelty necessary for termite eusociality to evolve.
Collapse
Affiliation(s)
- Alina A Mikhailova
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| |
Collapse
|
2
|
Dikmen F, Dabak T, Özgişi BD, Özenirler Ç, Kuralay SC, Çay SB, Çınar YU, Obut O, Balcı MA, Akbaba P, Aksel EG, Zararsız G, Solares E, Eldem V. Transcriptome-wide analysis uncovers regulatory elements of the antennal transcriptome repertoire of bumblebee at different life stages. INSECT MOLECULAR BIOLOGY 2024. [PMID: 38676460 DOI: 10.1111/imb.12914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 04/09/2024] [Indexed: 04/29/2024]
Abstract
Bumblebees are crucial pollinators, providing essential ecosystem services and global food production. The success of pollination services relies on the interaction between sensory organs and the environment. The antenna functions as a versatile multi-sensory organ, pivotal in mediating chemosensory/olfactory information, and governs adaptive responses to environmental changes. Despite an increasing number of RNA-sequencing studies on insect antenna, comprehensive antennal transcriptome studies at the different life stages were not elucidated systematically. Here, we quantified the expression profile and dynamics of coding/microRNA genes of larval head and antennal tissues from early- and late-stage pupa to the adult of Bombus terrestris as suitable model organism among pollinators. We further performed Pearson correlation analyses on the gene expression profiles of the antennal transcriptome from larval head tissue to adult stages, exploring both positive and negative expression trends. The positively correlated coding genes were primarily enriched in sensory perception of chemical stimuli, ion transport, transmembrane transport processes and olfactory receptor activity. Negatively correlated genes were mainly enriched in organic substance biosynthesis and regulatory mechanisms underlying larval body patterning and the formation of juvenile antennal structures. As post-transcriptional regulators, miR-1000-5p, miR-13b-3p, miR-263-5p and miR-252-5p showed positive correlations, whereas miR-315-5p, miR-92b-3p, miR-137-3p, miR-11-3p and miR-10-3p exhibited negative correlations in antennal tissue. Notably, based on the inverse expression relationship, positively and negatively correlated microRNA (miRNA)-mRNA target pairs revealed that differentially expressed miRNAs predictively targeted genes involved in antennal development, shaping antennal structures and regulating antenna-specific functions. Our data serve as a foundation for understanding stage-specific antennal transcriptomes and large-scale comparative analysis of transcriptomes in different insects.
Collapse
Affiliation(s)
- Fatih Dikmen
- Department of Biology, Istanbul University, İstanbul, Turkey
| | - Tunç Dabak
- Department of Biology, The Pennsylvania State University, State College, Pennsylvania, USA
| | | | | | | | | | | | - Onur Obut
- Department of Biology, Istanbul University, İstanbul, Turkey
| | | | - Pınar Akbaba
- Department of Biology, Istanbul University, İstanbul, Turkey
| | - Esma Gamze Aksel
- Faculty of Veterinary Medicine, Department of Genetics, Erciyes University, Kayseri, Turkey
| | - Gökmen Zararsız
- Department of Biostatistics, Erciyes University, Kayseri, Turkey
- Drug Application and Research Center (ERFARMA), Erciyes University, Kayseri, Turkey
| | - Edwin Solares
- Computer Science & Engineering Department, University of California, San Diego, California, USA
| | - Vahap Eldem
- Department of Biology, Istanbul University, İstanbul, Turkey
| |
Collapse
|
3
|
Nevers Y, Warwick Vesztrocy A, Rossier V, Train CM, Altenhoff A, Dessimoz C, Glover NM. Quality assessment of gene repertoire annotations with OMArk. Nat Biotechnol 2024:10.1038/s41587-024-02147-w. [PMID: 38383603 DOI: 10.1038/s41587-024-02147-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Alex Warwick Vesztrocy
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Victor Rossier
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Clément-Marie Train
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Adrian Altenhoff
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Natasha M Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
4
|
Wang Y, Xu S. A high-quality genome assembly of the waterlily aphid Rhopalosiphum nymphaeae. Sci Data 2024; 11:194. [PMID: 38351256 PMCID: PMC10864314 DOI: 10.1038/s41597-024-03043-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 02/03/2024] [Indexed: 02/16/2024] Open
Abstract
Waterlily aphid, Rhopalosiphum nymphaeae (Linnaeus), is a host-alternating aphid known to feed on both terrestrial and aquatic hosts. It causes damage through direct herbivory and acting as a vector for plant viruses, impacting worldwide Prunus spp. fruits and aquatic plants. Interestingly, R. nymphaeae's ability to thrive in both aquatic and terrestrial conditions sets it apart from other aphids, offering a unique perspective on adaptation. We present the first high-quality R. nymphaeae genome assembly with a size of 324.4 Mb using PacBio long-read sequencing. The resulting assembly is highly contiguous with a contig N50 reached 12.7 Mb. The BUSCO evaluation suggested a 97.5% completeness. The R. nymphaeae genome consists of 16.9% repetitive elements and 16,834 predicted protein-coding genes. Phylogenetic analysis positioned R. nymphaeae within the Aphidini tribe, showing close relations to R. maidis and R. padi. The high-quality reference genome R. nymphaeae provides a unique resource for understanding genome evolution in aphids and paves the foundation for understanding host plant adaptation mechanisms and developing pest control strategies.
Collapse
Affiliation(s)
- Yangzi Wang
- Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
- Institute for Evolution and Biodiversity, University of Münster, 48161, Münster, Germany
| | - Shuqing Xu
- Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, 55128, Mainz, Germany.
| |
Collapse
|
5
|
Feldmeyer B, Bornberg-Bauer E, Dohmen E, Fouks B, Heckenhauer J, Huylmans AK, Jones ARC, Stolle E, Harrison MC. Comparative Evolutionary Genomics in Insects. Methods Mol Biol 2024; 2802:473-514. [PMID: 38819569 DOI: 10.1007/978-1-0716-3838-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome sequencing quality, in terms of both read length and accuracy, is constantly improving. By combining long-read sequencing technologies with various scaffolding techniques, chromosome-level genome assemblies are now achievable at an affordable price for non-model organisms. Insects represent an exciting taxon for studying the genomic underpinnings of evolutionary innovations, due to ancient origins, immense species-richness, and broad phenotypic diversity. Here we summarize some of the most important methods for carrying out a comparative genomics study on insects. We describe available tools and offer concrete tips on all stages of such an endeavor from DNA extraction through genome sequencing, annotation, and several evolutionary analyses. Along the way we describe important insect-specific aspects, such as DNA extraction difficulties or gene families that are particularly difficult to annotate, and offer solutions. We describe results from several examples of comparative genomics analyses on insects to illustrate the fascinating questions that can now be addressed in this new age of genomics research.
Collapse
Affiliation(s)
- Barbara Feldmeyer
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Molecular Ecology, Frankfurt, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Bertrand Fouks
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Ann Kathrin Huylmans
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Alun R C Jones
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Eckart Stolle
- Museum Koenig, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Bonn, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
6
|
Feron R, Waterhouse RM. Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes. Gigascience 2022; 11:6537158. [PMID: 35217859 PMCID: PMC8881204 DOI: 10.1093/gigascience/giac006] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/12/2021] [Accepted: 01/13/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. FINDINGS Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets. CONCLUSIONS These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives.
Collapse
Affiliation(s)
- Romain Feron
- Department of Ecology and Evolution, Le Biophore UNIL-Sorge, University of Lausanne, Lausanne 1015, Switzerland.,Evolutionary-Functional Genomics Group, L'Amphipole UNIL-Sorge, Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Robert M Waterhouse
- Department of Ecology and Evolution, Le Biophore UNIL-Sorge, University of Lausanne, Lausanne 1015, Switzerland.,Evolutionary-Functional Genomics Group, L'Amphipole UNIL-Sorge, Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
7
|
Hernández-Fernández J, Pinzón Velasco AM, López Barrera EA, Rodríguez Becerra MDP, Villanueva-Cañas JL, Alba MM, Mariño Ramírez L. De novo assembly and functional annotation of blood transcriptome of loggerhead turtle, and in silico characterization of peroxiredoxins and thioredoxins. PeerJ 2021; 9:e12395. [PMID: 34820176 PMCID: PMC8606161 DOI: 10.7717/peerj.12395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 10/06/2021] [Indexed: 12/21/2022] Open
Abstract
The aim of this study was to generate and analyze the atlas of the loggerhead turtle blood transcriptome by RNA-seq, as well as identify and characterize thioredoxin (Tnxs) and peroxiredoxin (Prdxs) antioxidant enzymes of the greatest interest in the control of peroxide levels and other biological functions. The transcriptome of loggerhead turtle was sequenced using the Illumina Hiseq 2000 platform and de novo assembly was performed using the Trinity pipeline. The assembly comprised 515,597 contigs with an N50 of 2,631 bp. Contigs were analyzed with CD-Hit obtaining 374,545 unigenes, of which 165,676 had ORFs encoding putative proteins longer than 100 amino acids. A total of 52,147 (31.5%) of these transcripts had significant homology matches in at least one of the five databases used. From the enrichment of GO terms, 180 proteins with antioxidant activity were identified, among these 28 Prdxs and 50 putative Tnxs. The putative proteins of loggerhead turtles encoded by the genes Prdx1, Prdx3, Prdx5, Prdx6, Txn and Txnip were predicted and characterized in silico. When comparing Prdxs and Txns of loggerhead turtle with homologous human proteins, they showed 18 (9%), 52 (18%) 94 (43%), 36 (16%), 35 (33%) and 74 (19%) amino acid mutations respectively. However, they showed high conservation in active sites and structural motifs (98%), with few specific modifications. Of these, Prdx1, Prdx3, Prdx5, Prdx6, Txn and Txnip presented 0, 25, 18, three, six and two deleterious changes. This study provides a high quality blood transcriptome and functional annotation of loggerhead sea turtles.
Collapse
Affiliation(s)
- Javier Hernández-Fernández
- Department of Natural and Environmental Sciences, Faculty of Science and Engineering, Genetics, Molecular Biology and Bioinformatic Research Group-GENBIMOL, Universidad Jorge Tadeo Lozano, Bogotá, D.C., Colombia.,Faculty of Sciences, Department of Biology, Pontificia Universidad Javeriana, Bogotá, D.C., Colombia
| | | | - Ellie Anne López Barrera
- Institute of Environmental Studies and Services. IDEASA Research Group-IDEASA, Sergio Arboleda University, Bogotá, D.C., Colombia
| | - María Del Pilar Rodríguez Becerra
- Department of Natural and Environmental Sciences, Faculty of Science and Engineering, Genetics, Molecular Biology and Bioinformatic Research Group-GENBIMOL, Universidad Jorge Tadeo Lozano, Bogotá, D.C., Colombia
| | | | - M Mar Alba
- Evolutionary Genomics Group, Research Program on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | | |
Collapse
|
8
|
Dasgupta MG, Parveen AM, Rajasugunasekar D, Ulaganathan K. Wood transcriptome analysis and expression variation of lignin biosynthetic pathway transcripts in Ailanthus excelsa Roxb., a multi-purpose tropical tree species. J Biosci 2021. [DOI: 10.1007/s12038-021-00218-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Hernández-Fernández J, Pinzón-Velasco A, López EA, Rodríguez-Becerra P, Mariño-Ramírez L. Transcriptional Analyses of Acute Exposure to Methylmercury on Erythrocytes of Loggerhead Sea Turtle. TOXICS 2021; 9:70. [PMID: 33805397 PMCID: PMC8066450 DOI: 10.3390/toxics9040070] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/11/2021] [Accepted: 03/17/2021] [Indexed: 01/09/2023]
Abstract
To understand changes in enzyme activity and gene expression as biomarkers of exposure to methylmercury, we exposed loggerhead turtle erythrocytes (RBCs) to concentrations of 0, 1, and 5 mg L-1 of MeHg and de novo transcriptome were assembled using RNA-seq. The analysis of differentially expressed genes (DEGs) indicated that 79 unique genes were dysregulated (39 upregulated and 44 downregulated genes). The results showed that MeHg altered gene expression patterns as a response to the cellular stress produced, reflected in cell cycle regulation, lysosomal activity, autophagy, calcium regulation, mitochondrial regulation, apoptosis, and regulation of transcription and translation. The analysis of DEGs showed a low response of the antioxidant machinery to MeHg, evidenced by the fact that genes of early response to oxidative stress were not dysregulated. The RBCs maintained a constitutive expression of proteins that represented a good part of the defense against reactive oxygen species (ROS) induced by MeHg.
Collapse
Affiliation(s)
- Javier Hernández-Fernández
- Department of Natural and Environmental Science, Marine Biology Program, Faculty of Science and Engineering, Genetics, Molecular Biology and Bioinformatic Research Group–GENBIMOL, Jorge Tadeo Lozano University, Cra. 4 No 22-61, Bogotá 110311, Colombia;
- Faculty of Sciences, Department of Biology, Pontificia Universidad Javeriana, Calle 45, Cra. 7, Bogotá 110231, Colombia
| | - Andrés Pinzón-Velasco
- Bioinformática y Biología de Sistemas, Universidad Nacional de Colombia, Calle 45, Cra. 30, Bogotá 111321, Colombia;
| | - Ellie Anne López
- IDEASA Research Group-Environment and Sustainability, Institute of Environmental Studies and Services, Sergio Arboleda University, Bogotá 111711, Colombia;
| | - Pilar Rodríguez-Becerra
- Department of Natural and Environmental Science, Marine Biology Program, Faculty of Science and Engineering, Genetics, Molecular Biology and Bioinformatic Research Group–GENBIMOL, Jorge Tadeo Lozano University, Cra. 4 No 22-61, Bogotá 110311, Colombia;
| | - Leonardo Mariño-Ramírez
- NCBI, NLM, NIH Computational Biology Branch, Building 38A, Room 6S614M 8600 Rockville Pike, MSC 6075, Bethesda, MD 20894-6075, USA;
| |
Collapse
|
10
|
Bohn J, Halabian R, Schrader L, Shabardina V, Steffen R, Suzuki Y, Ernst UR, Gadau J, Makałowski W. Genome assembly and annotation of the California harvester ant Pogonomyrmex californicus. G3 (BETHESDA, MD.) 2021; 11:jkaa019. [PMID: 33561225 PMCID: PMC8022709 DOI: 10.1093/g3journal/jkaa019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 11/18/2020] [Indexed: 11/12/2022]
Abstract
The harvester ant genus Pogonomyrmex is endemic to arid and semiarid habitats and deserts of North and South America. The California harvester ant Pogonomyrmex californicus is the most widely distributed Pogonomyrmex species in North America. Pogonomyrmex californicus colonies are usually monogynous, i.e. a colony has one queen. However, in a few populations in California, primary polygyny evolved, i.e. several queens cooperate in colony founding after their mating flights and continue to coexist in mature colonies. Here, we present a genome assembly and annotation of P. californicus. The size of the assembly is 241 Mb, which is in agreement with the previously estimated genome size. We were able to annotate 17,889 genes in total, including 15,688 protein-coding ones with BUSCO (Benchmarking Universal Single-Copy Orthologs) completeness at a 95% level. The presented P. californicus genome assembly will pave the way for investigations of the genomic underpinnings of social polymorphism in the number of queens, regulation of aggression, and the evolution of adaptations to dry habitats.
Collapse
Affiliation(s)
- Jonas Bohn
- Faculty of Medicine, Institute of Bioinformatics, University of Münster, 48149 Münster, Germany
| | - Reza Halabian
- Faculty of Medicine, Institute of Bioinformatics, University of Münster, 48149 Münster, Germany
| | - Lukas Schrader
- Faculty of Biology, Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Victoria Shabardina
- Faculty of Medicine, Institute of Bioinformatics, University of Münster, 48149 Münster, Germany
| | - Raphael Steffen
- Faculty of Biology, Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan
| | - Ulrich R Ernst
- Faculty of Biology, Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Jürgen Gadau
- Faculty of Biology, Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Wojciech Makałowski
- Faculty of Medicine, Institute of Bioinformatics, University of Münster, 48149 Münster, Germany
| |
Collapse
|
11
|
Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics 2020; 21:293. [PMID: 32272892 PMCID: PMC7147072 DOI: 10.1186/s12864-020-6707-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/30/2020] [Indexed: 02/02/2023] Open
Abstract
Background The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Anne Jeannin-Girardon
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|