1
|
Paniagua A, Agustín-García C, Pardo-Palacios FJ, Brown T, De Maria M, Denslow ND, Mazzoni CJ, Conesa A. Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq. Genome Res 2025; 35:1053-1064. [PMID: 39715684 PMCID: PMC12047274 DOI: 10.1101/gr.279864.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 12/12/2024] [Indexed: 12/25/2024]
Abstract
While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or Pacific Biosciences (PacBio) Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4906 novel loci, represented by 5707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.
Collapse
Affiliation(s)
- Alejandro Paniagua
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna 46980, Spain
- Department of Computer Science, Universitat de València, Valencia 46100, Spain
| | - Cristina Agustín-García
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna 46980, Spain
| | | | - Thomas Brown
- Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research, 10315 Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, 14195 Berlin, Germany
| | - Maite De Maria
- Department of Physiological Sciences, Center for Environmental and Human Toxicology, University of Florida, Gainesville, Florida 32611, USA
| | - Nancy D Denslow
- Department of Physiological Sciences, Center for Environmental and Human Toxicology, University of Florida, Gainesville, Florida 32611, USA
| | - Camila J Mazzoni
- Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research, 10315 Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, 14195 Berlin, Germany
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna 46980, Spain;
| |
Collapse
|
2
|
Salamzade R, Tran P, Martin C, Manson A, Gilmore M, Earl A, Anantharaman K, Kalan L. zol and fai: large-scale targeted detection and evolutionary investigation of gene clusters. Nucleic Acids Res 2025; 53:gkaf045. [PMID: 39907107 PMCID: PMC11795205 DOI: 10.1093/nar/gkaf045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 12/06/2024] [Accepted: 01/24/2025] [Indexed: 02/06/2025] Open
Abstract
Many universally and conditionally important genes are genomically aggregated within clusters. Here, we introduce fai and zol, which together enable large-scale comparative analysis of different types of gene clusters and mobile-genetic elements, such as biosynthetic gene clusters (BGCs) or viruses. Fundamentally, they overcome a current bottleneck to reliably perform comprehensive orthology inference at large scale across broad taxonomic contexts and thousands of genomes. First, fai allows the identification of orthologous instances of a query gene cluster of interest amongst a database of target genomes. Subsequently, zol enables reliable, context-specific inference of ortholog groups for individual protein-encoding genes across gene cluster instances. In addition, zol performs functional annotation and computes a variety of evolutionary statistics for each inferred ortholog group. Importantly, in comparison to tools for visual exploration of homologous relationships between gene clusters, zol can scale to handle thousands of gene cluster instances and produce detailed reports that are easy to digest. To showcase fai and zol, we apply them for: (i) longitudinal tracking of a virus in metagenomes, (ii) performing population genetic investigations of BGCs for a fungal species, and (iii) uncovering evolutionary trends for a virulence-associated gene cluster across thousands of genomes from a diverse bacterial genus.
Collapse
Affiliation(s)
- Rauf Salamzade
- Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53706, United States
- Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Patricia Q Tran
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, United States
- Freshwater and Marine Science Doctoral Program, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Cody Martin
- Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, 53706, United States
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Abigail L Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Michael S Gilmore
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
- Department of Ophthalmology, Harvard Medical School and Massachusetts Eye and Ear, Boston, MA, 02114, United States
- Department of Microbiology, Harvard Medical School and Massachusetts Eye and Ear, Boston, MA, 02115, United States
| | - Ashlee M Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Karthik Anantharaman
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Lindsay R Kalan
- Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53706, United States
- Department of Medicine, Division of Infectious Disease, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53705, United States
- M.G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, L8S 4L8, Canada
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
| |
Collapse
|
3
|
Salamzade R, Tran PQ, Martin C, Manson AL, Gilmore MS, Earl AM, Anantharaman K, Kalan LR. zol & fai: large-scale targeted detection and evolutionary investigation of gene clusters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.07.544063. [PMID: 37333121 PMCID: PMC10274777 DOI: 10.1101/2023.06.07.544063] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Many universally and conditionally important genes are genomically aggregated within clusters. Here, we introduce fai and zol, which together enable large-scale comparative analysis of different types of gene clusters and mobile-genetic elements (MGEs), such as biosynthetic gene clusters (BGCs) or viruses. Fundamentally, they overcome a current bottleneck to reliably perform comprehensive orthology inference at large scale across broad taxonomic contexts and thousands of genomes. First, fai allows the identification of orthologous instances of a query gene cluster of interest amongst a database of target genomes. Subsequently, zol enables reliable, context-specific inference of ortholog groups for individual protein-encoding genes across gene cluster instances. In addition, zol performs functional annotation and computes a variety of evolutionary statistics for each inferred ortholog group. Importantly, in comparison to tools for visual exploration of homologous relationships between gene clusters, zol can scale to thousands of gene cluster instances and produce detailed reports that are easy to digest. To showcase fai and zol, we apply them for: (i) longitudinal tracking of a virus in metagenomes, (ii) discovering novel population-level genetic insights of two common BGCs in the fungal species Aspergillus flavus, and (iii) uncovering large-scale evolutionary trends of a virulence-associated gene cluster across thousands of genomes from a diverse bacterial genus.
Collapse
Affiliation(s)
- Rauf Salamzade
- Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
- Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, USA
| | - Patricia Q. Tran
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Freshwater and Marine Science Doctoral Program, University of Wisconsin-Madison, WI, USA
| | - Cody Martin
- Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Abigail L. Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Michael S. Gilmore
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Ophthalmology, Harvard Medical School and Mass Eye and Ear, Boston, Massachusetts, USA
- Department of Microbiology, Harvard Medical School and Mass Eye and Ear, Boston, Massachusetts, USA
| | - Ashlee M. Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Lindsay R. Kalan
- Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
- Department of Medicine, Division of Infectious Disease, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
- M.G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
4
|
Bianco L, Fontana P, Marchesini A, Torre S, Moser M, Piazza S, Alessandri S, Pavese V, Pollegioni P, Vernesi C, Malnoy M, Torello Marinoni D, Murolo S, Dondini L, Mattioni C, Botta R, Sebastiani F, Micheletti D, Palmieri L. The de novo, chromosome-level genome assembly of the sweet chestnut (Castanea sativa Mill.) Cv. Marrone Di Chiusa Pesio. BMC Genom Data 2024; 25:64. [PMID: 38909221 PMCID: PMC11193896 DOI: 10.1186/s12863-024-01245-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 06/17/2024] [Indexed: 06/24/2024] Open
Abstract
OBJECTIVES The sweet chestnut Castanea sativa Mill. is the only native Castanea species in Europe, and it is a tree of high economic value that provides appreciated fruits and valuable wood. In this study, we assembled a high-quality nuclear genome of the ancient Italian chestnut variety 'Marrone di Chiusa Pesio' using a combination of Oxford Nanopore Technologies long reads, whole-genome and Omni-C Illumina short reads. DATA DESCRIPTION The genome was assembled into 238 scaffolds with an N50 size of 21.8 Mb and an N80 size of 7.1 Mb for a total assembled sequence of 750 Mb. The BUSCO assessment revealed that 98.6% of the genome matched the embryophyte dataset, highlighting good completeness of the genetic space. After chromosome-level scaffolding, 12 chromosomes with a total length of 715.8 and 713.0 Mb were constructed for haplotype 1 and haplotype 2, respectively. The repetitive elements represented 37.3% and 37.4% of the total assembled genome in haplotype 1 and haplotype 2, respectively. A total of 57,653 and 58,146 genes were predicted in the two haplotypes, and approximately 73% of the genes were functionally annotated using the EggNOG-mapper. The assembled genome will be a valuable resource and reference for future chestnut breeding and genetic improvement.
Collapse
Affiliation(s)
- Luca Bianco
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Paolo Fontana
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Alexis Marchesini
- Research Institute on Terrestrial Ecosystem, National Research Council, Via Marconi 2, Porano, TR, 05010, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| | - Sara Torre
- Institute for Sustainable Plant Protection, National Research Council, Via Madonna del Piano 10, 50019, Sesto Fiorentino FI, Italy
| | - Mirko Moser
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Stefano Piazza
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Sara Alessandri
- Dept. of Agricultural and Food Sciences, University of Bologna, Via Zamboni 33, Bologna, BO, 40126, Italy
| | - Vera Pavese
- Dept. of Agricultural, Forest and Food Sci, University of Turin, L.go P. Braccini 2, Grugliasco, TO, 10095, Italy
| | - Paola Pollegioni
- Research Institute on Terrestrial Ecosystem, National Research Council, Via Marconi 2, Porano, TR, 05010, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| | - Cristiano Vernesi
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Mickael Malnoy
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Daniela Torello Marinoni
- Dept. of Agricultural, Forest and Food Sci, University of Turin, L.go P. Braccini 2, Grugliasco, TO, 10095, Italy
| | - Sergio Murolo
- Dep. of Agricultural, Food and Env.Sci, Marche Polytechnic University, via Brecce Bianche, Ancona, AN, 60131, Italy
| | - Luca Dondini
- Dept. of Agricultural and Food Sciences, University of Bologna, Via Zamboni 33, Bologna, BO, 40126, Italy
| | - Claudia Mattioni
- Research Institute on Terrestrial Ecosystem, National Research Council, Via Marconi 2, Porano, TR, 05010, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| | - Roberto Botta
- Dept. of Agricultural, Forest and Food Sci, University of Turin, L.go P. Braccini 2, Grugliasco, TO, 10095, Italy
| | - Federico Sebastiani
- Institute for Sustainable Plant Protection, National Research Council, Via Madonna del Piano 10, 50019, Sesto Fiorentino FI, Italy
| | - Diego Micheletti
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy
| | - Luisa Palmieri
- Research and Innovation Center, Edmund Mach Foundation, via Mach 1, San Michele all'Adige, TN, 38098, Italy.
| |
Collapse
|
5
|
Ariffin N, Newman DW, Nelson MG, O’cualain R, Hubbard SJ. Proteogenomic Gene Structure Validation in the Pineapple Genome. J Proteome Res 2024; 23:1583-1592. [PMID: 38651221 PMCID: PMC11077482 DOI: 10.1021/acs.jproteome.3c00675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/15/2024] [Accepted: 04/12/2024] [Indexed: 04/25/2024]
Abstract
MD2 pineapple (Ananas comosus) is the second most important tropical crop that preserves crassulacean acid metabolism (CAM), which has high water-use efficiency and is fast becoming the most consumed fresh fruit worldwide. Despite the significance of environmental efficiency and popularity, until very recently, its genome sequence has not been determined and a high-quality annotated proteome has not been available. Here, we have undertaken a pilot proteogenomic study, analyzing the proteome of MD2 pineapple leaves using liquid chromatography-mass spectrometry (LC-MS/MS), which validates 1781 predicted proteins in the annotated F153 (V3) genome. In addition, a further 603 peptide identifications are found that map exclusively to an independent MD2 transcriptome-derived database but are not found in the standard F153 (V3) annotated proteome. Peptide identifications derived from these MD2 transcripts are also cross-referenced to a more recent and complete MD2 genome annotation, resulting in 402 nonoverlapping peptides, which in turn support 30 high-quality gene candidates novel to both pineapple genomes. Many of the validated F153 (V3) genes are also supported by an independent proteomics data set collected for an ornamental pineapple variety. The contigs and peptides have been mapped to the current F153 genome build and are available as bed files to display a custom gene track on the Ensembl Plants region viewer. These analyses add to the knowledge of experimentally validated pineapple genes and demonstrate the utility of transcript-derived proteomics to discover both novel genes and genetic structure in a plant genome, adding value to its annotation.
Collapse
Affiliation(s)
- Norazrin Ariffin
- School
of Biological Sciences, Faculty of Biology Medicine and Health, MAHSC, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, United Kingdom
- Department
of Agriculture Technology, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Selangor Darul Ehsan, Malaysia
| | - David Wells Newman
- School
of Biological Sciences, Faculty of Biology Medicine and Health, MAHSC, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, United Kingdom
| | - Michael G. Nelson
- School
of Biological Sciences, Faculty of Biology Medicine and Health, MAHSC, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, United Kingdom
| | - Ronan O’cualain
- School
of Biological Sciences, Faculty of Biology Medicine and Health, MAHSC, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, United Kingdom
| | - Simon J. Hubbard
- School
of Biological Sciences, Faculty of Biology Medicine and Health, MAHSC, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, United Kingdom
| |
Collapse
|
6
|
Bryant AS, Akimori D, Stoltzfus JDC, Hallem EA. A standard workflow for community-driven manual curation of Strongyloides genome annotations. Philos Trans R Soc Lond B Biol Sci 2024; 379:20220443. [PMID: 38008112 PMCID: PMC10676816 DOI: 10.1098/rstb.2022.0443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 07/18/2023] [Indexed: 11/28/2023] Open
Abstract
Advances in the functional genomics and bioinformatics toolkits for Strongyloides species have positioned these species as genetically tractable model systems for gastrointestinal parasitic nematodes. As community interest in mechanistic studies of Strongyloides species continues to grow, publicly accessible reference genomes and associated genome annotations are critical resources for researchers. Genome annotations for multiple Strongyloides species are broadly available via the WormBase and WormBase ParaSite online repositories. However, a recent phylogenetic analysis of the receptor-type guanylate cyclase (rGC) gene family in two Strongyloides species highlights the potential for errors in a large percentage of current Strongyloides gene models. Here, we present three examples of gene annotation updates within the Strongyloides rGC gene family; each example illustrates a type of error that may occur frequently within the annotation data for Strongyloides genomes. We also extend our analysis to 405 previously curated Strongyloides genes to confirm that gene model errors are found at high rates across gene families. Finally, we introduce a standard manual curation workflow for assessing gene annotation quality and generating corrections, and we discuss how it may be used to facilitate community-driven curation of parasitic nematode biodata. This article is part of the Theo Murphy meeting issue 'Strongyloides: omics to worm-free populations'.
Collapse
Affiliation(s)
- Astra S. Bryant
- Department of Physiology and Biophysics, University of Washington, Seattle, WA 98195, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA
| | - Damia Akimori
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA
- Molecular Biology Interdepartmental PhD Program, University of California, Los Angeles, CA 90095, USA
| | | | - Elissa A. Hallem
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
7
|
Brůna T, Li H, Guhlin J, Honsel D, Herbold S, Stanke M, Nenasheva N, Ebel M, Gabriel L, Hoff KJ. Galba: genome annotation with miniprot and AUGUSTUS. BMC Bioinformatics 2023; 24:327. [PMID: 37653395 PMCID: PMC10472564 DOI: 10.1186/s12859-023-05449-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023] Open
Abstract
BACKGROUND The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. RESULTS Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. CONCLUSIONS Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.
Collapse
Affiliation(s)
- Tomáš Brůna
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, 02215, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02215, MA, USA
| | - Joseph Guhlin
- Genomics Aotearoa and Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, Dunedin, 9016, New Zealand
| | - Daniel Honsel
- Institute of Computer Science, University of Göttingen, 37077, Göttingen, Germany
| | - Steffen Herbold
- Faculty for Computer Science and Mathematics, University of Passau, 94032, Passau, Germany
| | - Mario Stanke
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, 17489, Greifswald, Germany
| | - Natalia Nenasheva
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, 17489, Greifswald, Germany
| | - Matthis Ebel
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, 17489, Greifswald, Germany
| | - Lars Gabriel
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, 17489, Greifswald, Germany
| | - Katharina J Hoff
- Institute of Mathematics and Computer Science, and Center for Functional Genomics of Microbes, University of Greifswald, 17489, Greifswald, Germany.
| |
Collapse
|
8
|
Li JX, Fernandez KX, Ritland C, Jancsik S, Engelhardt DB, Coombe L, Warren RL, van Belkum MJ, Carroll AL, Vederas JC, Bohlmann J, Birol I. Genomic virulence features of Beauveria bassiana as a biocontrol agent for the mountain pine beetle population. BMC Genomics 2023; 24:390. [PMID: 37430186 DOI: 10.1186/s12864-023-09473-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
BACKGROUND The mountain pine beetle, Dendroctonus ponderosae, is an irruptive bark beetle that causes extensive mortality to many pine species within the forests of western North America. Driven by climate change and wildfire suppression, a recent mountain pine beetle (MPB) outbreak has spread across more than 18 million hectares, including areas to the east of the Rocky Mountains that comprise populations and species of pines not previously affected. Despite its impacts, there are few tactics available to control MPB populations. Beauveria bassiana is an entomopathogenic fungus used as a biological agent in agriculture and forestry and has potential as a management tactic for the mountain pine beetle population. This work investigates the phenotypic and genomic variation between B. bassiana strains to identify optimal strains against a specific insect. RESULTS Using comparative genome and transcriptome analyses of eight B. bassiana isolates, we have identified the genetic basis of virulence, which includes oosporein production. Genes unique to the more virulent strains included functions in biosynthesis of mycotoxins, membrane transporters, and transcription factors. Significant differential expression of genes related to virulence, transmembrane transport, and stress response was identified between the different strains, as well as up to nine-fold upregulation of genes involved in the biosynthesis of oosporein. Differential correlation analysis revealed transcription factors that may be involved in regulating oosporein production. CONCLUSION This study provides a foundation for the selection and/or engineering of the most effective strain of B. bassiana for the biological control of mountain pine beetle and other insect pests populations.
Collapse
Affiliation(s)
- Janet X Li
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada.
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada.
| | - Kleinberg X Fernandez
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Carol Ritland
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Sharon Jancsik
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Daniel B Engelhardt
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada
| | - Marco J van Belkum
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Allan L Carroll
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - John C Vederas
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
9
|
Goodswen SJ, Kennedy PJ, Ellis JT. A state-of-the-art methodology for high-throughput in silico vaccine discovery against protozoan parasites and exemplified with discovered candidates for Toxoplasma gondii. Sci Rep 2023; 13:8243. [PMID: 37217589 DOI: 10.1038/s41598-023-34863-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 05/09/2023] [Indexed: 05/24/2023] Open
Abstract
Vaccine discovery against eukaryotic parasites is not trivial as highlighted by the limited number of known vaccines compared to the number of protozoal diseases that need one. Only three of 17 priority diseases have commercial vaccines. Live and attenuated vaccines have proved to be more effective than subunit vaccines but adversely pose more unacceptable risks. One promising approach for subunit vaccines is in silico vaccine discovery, which predicts protein vaccine candidates given thousands of target organism protein sequences. This approach, nonetheless, is an overarching concept with no standardised guidebook on implementation. No known subunit vaccines against protozoan parasites exist as a result of this approach, and consequently none to emulate. The study goal was to combine current in silico discovery knowledge specific to protozoan parasites and develop a workflow representing a state-of-the-art approach. This approach reflectively integrates a parasite's biology, a host's immune system defences, and importantly, bioinformatics programs needed to predict vaccine candidates. To demonstrate the workflow effectiveness, every Toxoplasma gondii protein was ranked in its capacity to provide long-term protective immunity. Although testing in animal models is required to validate these predictions, most of the top ranked candidates are supported by publications reinforcing our confidence in the approach.
Collapse
Affiliation(s)
- Stephen J Goodswen
- School of Life Sciences, University of Technology Sydney, 15 Broadway, Ultimo, NSW, 2007, Australia
| | - Paul J Kennedy
- School of Computer Science, Faculty of Engineering and Information Technology and the Australian Artificial Intelligence Institute, University of Technology Sydney, 15 Broadway, Ultimo, NSW, 2007, Australia
| | - John T Ellis
- School of Life Sciences, University of Technology Sydney, 15 Broadway, Ultimo, NSW, 2007, Australia.
| |
Collapse
|
10
|
Brůna T, Li H, Guhlin J, Honsel D, Herbold S, Stanke M, Nenasheva N, Ebel M, Gabriel L, Hoff KJ. GALBA: Genome Annotation with Miniprot and AUGUSTUS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.10.536199. [PMID: 37090650 PMCID: PMC10120627 DOI: 10.1101/2023.04.10.536199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein- to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a previously unannotated land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.
Collapse
Affiliation(s)
- Tomáš Brůna
- US Department of Energy Joint Genome Institute, Berkeley, CA 94720, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA & Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Joseph Guhlin
- Genomics Aotearoa and Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9016, New Zealand
| | - Daniel Honsel
- Institute of Computer Science, University of Göttingen, 37077 Göttingen, Germany
| | - Steffen Herbold
- Faculty for Computer Science and Mathematics, University of Passau, 94032 Passau, Germany
| | - Mario Stanke
- Institute of Mathematics and Computer Science & Center for Functional Genomics of Microbes, University of Greifswald, 17489 Greifswald, Germany
| | - Natalia Nenasheva
- Institute of Mathematics and Computer Science & Center for Functional Genomics of Microbes, University of Greifswald, 17489 Greifswald, Germany
| | - Matthis Ebel
- Institute of Mathematics and Computer Science & Center for Functional Genomics of Microbes, University of Greifswald, 17489 Greifswald, Germany
| | - Lars Gabriel
- Institute of Mathematics and Computer Science & Center for Functional Genomics of Microbes, University of Greifswald, 17489 Greifswald, Germany
| | - Katharina J. Hoff
- Institute of Mathematics and Computer Science & Center for Functional Genomics of Microbes, University of Greifswald, 17489 Greifswald, Germany
| |
Collapse
|
11
|
Mayer C, Vogt A, Uslu T, Scalzitti N, Chennen K, Poch O, Thompson JD. CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach. J Fungi (Basel) 2023; 9:jof9040424. [PMID: 37108879 PMCID: PMC10141177 DOI: 10.3390/jof9040424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/21/2023] [Accepted: 03/28/2023] [Indexed: 03/31/2023] Open
Abstract
In fungi, the most abundant transcription factor (TF) class contains a fungal-specific ‘GAL4-like’ Zn2C6 DNA binding domain (DBD), while the second class contains another fungal-specific domain, known as ‘fungal_trans’ or middle homology domain (MHD), whose function remains largely uncharacterized. Remarkably, almost a third of MHD-containing TFs in public sequence databases apparently lack DNA binding activity, since they are not predicted to contain a DBD. Here, we reassess the domain organization of these ‘MHD-only’ proteins using an in silico error-tracking approach. In a large-scale analysis of ~17,000 MHD-only TF sequences present in all fungal phyla except Microsporidia and Cryptomycota, we show that the vast majority (>90%) result from genome annotation errors and we are able to predict a new DBD sequence for 14,261 of them. Most of these sequences correspond to a Zn2C6 domain (82%), with a small proportion of C2H2 domains (4%) found only in Dikarya. Our results contradict previous findings that the MHD-only TF are widespread in fungi. In contrast, we show that they are exceptional cases, and that the fungal-specific Zn2C6–MHD domain pair represents the canonical domain signature defining the most predominant fungal TF family. We call this family CeGAL, after the highly characterized members: Cep3, whose 3D structure is determined, and GAL4, a eukaryotic TF archetype. We believe that this will not only improve the annotation and classification of the Zn2C6 TF but will also provide critical guidance for future fungal gene regulatory network analyses.
Collapse
Affiliation(s)
- Claudine Mayer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Faculté des Sciences, Université Paris Cité, UFR Sciences du Vivant, 75013 Paris, France
- Correspondence: (C.M.); (J.D.T.)
| | - Arthur Vogt
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Tuba Uslu
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Julie D. Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Correspondence: (C.M.); (J.D.T.)
| |
Collapse
|
12
|
Khodji H, Collet P, Thompson JD, Jeannin-Girardon A. De-MISTED: Image-based classification of erroneous multiple sequence alignments using convolutional neural networks. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04390-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
13
|
Fisher CR, Wilson M, Scott JG. A chromosome-level assembly of the widely used Rockefeller strain of Aedes aegypti, the yellow fever mosquito. G3 GENES|GENOMES|GENETICS 2022; 12:6695221. [PMID: 36086997 PMCID: PMC9635639 DOI: 10.1093/g3journal/jkac242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Aedes aegypti is the vector of important human diseases, and genomic resources are crucial in facilitating the study of A. aegypti and its ecosystem interactions. Several laboratory-acclimated strains of this mosquito have been established, but the most used strain in toxicology studies is “Rockefeller,” which was originally collected and established in Cuba 130 years ago. A full-length genome assembly of another reference strain, “Liverpool,” was published in 2018 and is the reference genome for the species (AaegL5). However, genetic studies with the Rockefeller strain are complicated by the availability of only the Liverpool strain as the reference genome. Differences between Liverpool and Rockefeller have been known for decades, particularly in the expression of genes relevant to mosquito behavior and vector control (e.g. olfactory). These differences indicate that AaegL5 is likely not fully representative of the Rockefeller genome, presenting potential impediments to research. Here, we present a chromosomal-level assembly and annotation of the Rockefeller genome and a comparative characterization vs the Liverpool genome. Our results set the stage for a pan-genomic approach to understanding evolution and diversity within this important disease vector.
Collapse
Affiliation(s)
- Cera R Fisher
- Department of Entomology, Comstock Hall, Cornell University , Ithaca, NY 14853, USA
| | - Michael Wilson
- Center for Cell Analysis & Modeling, University of Connecticut Health Center , Farmington, CT 06030, USA
| | - Jeffrey G Scott
- Department of Entomology, Comstock Hall, Cornell University , Ithaca, NY 14853, USA
| |
Collapse
|
14
|
Chromosome-scale Echinococcus granulosus (genotype G1) genome reveals the Eg95 gene family and conservation of the EG95-vaccine molecule. Commun Biol 2022; 5:199. [PMID: 35241789 PMCID: PMC8894454 DOI: 10.1038/s42003-022-03125-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/04/2022] [Indexed: 11/23/2022] Open
Abstract
Cystic echinococcosis is a socioeconomically important parasitic disease caused by the larval stage of the canid tapeworm Echinococcus granulosus, afflicting millions of humans and animals worldwide. The development of a vaccine (called EG95) has been the most notable translational advance in the fight against this disease in animals. However, almost nothing is known about the genomic organisation/location of the family of genes encoding EG95 and related molecules, the extent of their conservation or their functions. The lack of a complete reference genome for E. granulosus genotype G1 has been a major obstacle to addressing these areas. Here, we assembled a chromosomal-scale genome for this genotype by scaffolding to a high quality genome for the congener E. multilocularis, localised Eg95 gene family members in this genome, and evaluated the conservation of the EG95 vaccine molecule. These results have marked implications for future explorations of aspects such as developmentally-regulated gene transcription/expression (using replicate samples) for all E. granulosus stages; structural and functional roles of non-coding genome regions; molecular ‘cross-talk’ between oncosphere and the immune system; and defining the precise function(s) of EG95. Applied aspects should include developing improved tools for the diagnosis and chemotherapy of cystic echinococcosis of humans. A high-quality genome for the parasitic tapeworm, Echinococcus granulosus, provides further insight into the EG95 vaccine target for cystic echinococcosis.
Collapse
|
15
|
Young ND, Stroehlein AJ, Wang T, Korhonen PK, Mentink-Kane M, Stothard JR, Rollinson D, Gasser RB. Nuclear genome of Bulinus truncatus, an intermediate host of the carcinogenic human blood fluke Schistosoma haematobium. Nat Commun 2022; 13:977. [PMID: 35190553 PMCID: PMC8861042 DOI: 10.1038/s41467-022-28634-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 02/02/2022] [Indexed: 02/07/2023] Open
Abstract
Some snails act as intermediate hosts (vectors) for parasitic flatworms (flukes) that cause neglected tropical diseases, such as schistosomiases. Schistosoma haematobium is a blood fluke that causes urogenital schistosomiasis and induces bladder cancer and increased risk of HIV infection. Understanding the molecular biology of the snail and its relationship with the parasite could guide development of an intervention approach that interrupts transmission. Here, we define the genome for a key intermediate host of S. haematobium-called Bulinus truncatus-and explore protein groups inferred to play an integral role in the snail's biology and its relationship with the schistosome parasite. Bu. truncatus shared many orthologous protein groups with Biomphalaria glabrata-the key snail vector for S. mansoni which causes hepatointestinal schistosomiasis in people. Conspicuous were expansions in signalling and membrane trafficking proteins, peptidases and their inhibitors as well as gene families linked to immune response regulation, such as a large repertoire of lectin-like molecules. This work provides a sound basis for further studies of snail-parasite interactions in the search for targets to block schistosomiasis transmission.
Collapse
Affiliation(s)
- Neil D Young
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, Australia.
| | - Andreas J Stroehlein
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Tao Wang
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Pasi K Korhonen
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Margaret Mentink-Kane
- NIH-NIAID Schistosomiasis Resource Center, Biomedical Research Institute (BRI), Rockville, MD, USA
| | - J Russell Stothard
- Department of Parasitology, Liverpool School of Tropical Medicine, Liverpool, UK
| | - David Rollinson
- Department of Life Sciences, Natural History Museum, London, UK
- London Centre for Neglected Tropical Disease Research, London, UK
| | - Robin B Gasser
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
16
|
Meyer C, Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes. BMC Bioinformatics 2020; 21:513. [PMID: 33172385 PMCID: PMC7656754 DOI: 10.1186/s12859-020-03855-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/30/2020] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon–intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. Results We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. Conclusions Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon–intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction.
Collapse
Affiliation(s)
- Corentin Meyer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Nicolas Scalzitti
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Anne Jeannin-Girardon
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
17
|
M Real F, Haas SA, Franchini P, Xiong P, Simakov O, Kuhl H, Schöpflin R, Heller D, Moeinzadeh MH, Heinrich V, Krannich T, Bressin A, Hartmann MF, Wudy SA, Dechmann DKN, Hurtado A, Barrionuevo FJ, Schindler M, Harabula I, Osterwalder M, Hiller M, Wittler L, Visel A, Timmermann B, Meyer A, Vingron M, Jiménez R, Mundlos S, Lupiáñez DG. The mole genome reveals regulatory rearrangements associated with adaptive intersexuality. Science 2020; 370:208-214. [PMID: 33033216 PMCID: PMC8243244 DOI: 10.1126/science.aaz2582] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 04/19/2020] [Accepted: 08/17/2020] [Indexed: 01/01/2023]
Abstract
Linking genomic variation to phenotypical traits remains a major challenge in evolutionary genetics. In this study, we use phylogenomic strategies to investigate a distinctive trait among mammals: the development of masculinizing ovotestes in female moles. By combining a chromosome-scale genome assembly of the Iberian mole, Talpa occidentalis, with transcriptomic, epigenetic, and chromatin interaction datasets, we identify rearrangements altering the regulatory landscape of genes with distinct gonadal expression patterns. These include a tandem triplication involving CYP17A1, a gene controlling androgen synthesis, and an intrachromosomal inversion involving the pro-testicular growth factor gene FGF9, which is heterochronically expressed in mole ovotestes. Transgenic mice with a knock-in mole CYP17A1 enhancer or overexpressing FGF9 showed phenotypes recapitulating mole sexual features. Our results highlight how integrative genomic approaches can reveal the phenotypic impact of noncoding sequence changes.
Collapse
Affiliation(s)
- Francisca M Real
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Stefan A Haas
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Paolo Franchini
- Chair in Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| | - Peiwen Xiong
- Chair in Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| | - Oleg Simakov
- Department of Molecular Evolution and Development, University of Vienna, 1090 Vienna, Austria
| | - Heiner Kuhl
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
| | - Robert Schöpflin
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - David Heller
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - M-Hossein Moeinzadeh
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Verena Heinrich
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Thomas Krannich
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Annkatrin Bressin
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Michaela F Hartmann
- Steroid Research & Mass Spectrometry Unit, Laboratory for Translational Hormone Analytics in Paediatric Endocrinology, Division of Paediatric Endocrinology & Diabetology, Center of Child and Adolescent Medicine, Justus Liebig University, Giessen, Germany
| | - Stefan A Wudy
- Steroid Research & Mass Spectrometry Unit, Laboratory for Translational Hormone Analytics in Paediatric Endocrinology, Division of Paediatric Endocrinology & Diabetology, Center of Child and Adolescent Medicine, Justus Liebig University, Giessen, Germany
| | - Dina K N Dechmann
- Department of Migration and Immuno-Ecology, Max Planck Institute for Animal Behavior, Radolfzell, Germany
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Alicia Hurtado
- Departamento de Genética, Universidad de Granada, Granada, Spain
- Instituto de Biotecnología, Centro de Investigación Biomédica, Universidad de Granada, Armilla, Granada, Spain
| | - Francisco J Barrionuevo
- Departamento de Genética, Universidad de Granada, Granada, Spain
- Instituto de Biotecnología, Centro de Investigación Biomédica, Universidad de Granada, Armilla, Granada, Spain
| | - Magdalena Schindler
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Izabela Harabula
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Marco Osterwalder
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- Department for BioMedical Research (DBMR), University of Bern, 3008 Bern, Switzerland
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Lars Wittler
- Department of Developmental Genetics, Transgenic Unit, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Berkeley, CA 94720, USA
- School of Natural Sciences, University of California, Merced, CA 95343, USA
| | - Bernd Timmermann
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Axel Meyer
- Chair in Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Rafael Jiménez
- Departamento de Genética, Universidad de Granada, Granada, Spain
- Instituto de Biotecnología, Centro de Investigación Biomédica, Universidad de Granada, Armilla, Granada, Spain
| | - Stefan Mundlos
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany.
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Darío G Lupiáñez
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany.
- Institute for Medical and Human Genetics, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Charité - Universitätsmedizin Berlin, Berlin, Germany
- Epigenetics and Sex Development Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| |
Collapse
|
18
|
High-Quality Assemblies for Three Invasive Social Wasps from the Vespula Genus. G3-GENES GENOMES GENETICS 2020; 10:3479-3488. [PMID: 32859687 PMCID: PMC7534447 DOI: 10.1534/g3.120.401579] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Social wasps of the genus Vespula have spread to nearly all landmasses worldwide and have become significant pests in their introduced ranges, affecting economies and biodiversity. Comprehensive genome assemblies and annotations for these species are required to develop the next generation of control strategies and monitor existing chemical control. We sequenced and annotated the genomes of the common wasp (Vespula vulgaris), German wasp (Vespula germanica), and the western yellowjacket (Vespula pensylvanica). Our chromosome-level Vespula assemblies each contain 176–179 Mb of total sequence assembled into 25 scaffolds, with 10–200 unanchored scaffolds, and 16,566–18,948 genes. We annotated gene sets relevant to the applied management of invasive wasp populations, including genes associated with spermatogenesis and development, pesticide resistance, olfactory receptors, immunity and venom. These genomes provide evidence for active DNA methylation in Vespidae and tandem duplications of venom genes. Our genomic resources will contribute to the development of next-generation control strategies, and monitoring potential resistance to chemical control.
Collapse
|
19
|
Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics 2020; 21:293. [PMID: 32272892 PMCID: PMC7147072 DOI: 10.1186/s12864-020-6707-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/30/2020] [Indexed: 02/02/2023] Open
Abstract
Background The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Anne Jeannin-Girardon
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
20
|
Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies. G3-GENES GENOMES GENETICS 2020; 10:721-730. [PMID: 31862787 PMCID: PMC7003083 DOI: 10.1534/g3.119.400758] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Thanks to huge advances in sequencing technologies, genomic resources are increasingly being generated and shared by the scientific community. The quality of such public resources are therefore of critical importance. Errors due to contamination are particularly worrying; they are widespread, propagate across databases, and can compromise downstream analyses, especially the detection of horizontally-transferred sequences. However we still lack consistent and comprehensive assessments of contamination prevalence in public genomic data. Here we applied a standardized procedure for foreign sequence annotation to 43 published arthropod genomes from the widely used Ensembl Metazoa database. This method combines information on sequence similarity and synteny to identify contaminant and putative horizontally-transferred sequences in any genome assembly, provided that an adequate reference database is available. We uncovered considerable heterogeneity in quality among arthropod assemblies, some being devoid of contaminant sequences, whereas others included hundreds of contaminant genes. Contaminants far outnumbered horizontally-transferred genes and were a major confounder of their detection, quantification and analysis. We strongly recommend that automated standardized decontamination procedures be systematically embedded into the submission process to genomic databases.
Collapse
|
21
|
Wilbrandt J, Misof B, Panfilio KA, Niehuis O. Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genomics 2019; 20:753. [PMID: 31623555 PMCID: PMC6798390 DOI: 10.1186/s12864-019-6064-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/27/2019] [Indexed: 02/06/2023] Open
Abstract
Background The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. Results Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. Conclusions In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative. Electronic supplementary material The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jeanne Wilbrandt
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany. .,Present address: Hoffmann Research Group, Leibniz Institute on Aging - Fritz Lipmann Institute, Beutenbergstraße 11, 07745, Jena, Germany.
| | - Bernhard Misof
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany
| | - Kristen A Panfilio
- School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL, UK
| | - Oliver Niehuis
- Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University, Hauptstr. 1, 79104, Freiburg, Germany
| |
Collapse
|
22
|
Dhaygude K, Nair A, Johansson H, Wurm Y, Sundström L. The first draft genomes of the ant Formica exsecta, and its Wolbachia endosymbiont reveal extensive gene transfer from endosymbiont to host. BMC Genomics 2019; 20:301. [PMID: 30991952 PMCID: PMC6469114 DOI: 10.1186/s12864-019-5665-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Accepted: 04/02/2019] [Indexed: 02/05/2023] Open
Abstract
Background Adapting to changes in the environment is the foundation of species survival, and is usually thought to be a gradual process. However, transposable elements (TEs), epigenetic modifications, and/or genetic material acquired from other organisms by means of horizontal gene transfer (HGTs), can also lead to novel adaptive traits. Social insects form dense societies, which attract and maintain extra- and intracellular accessory inhabitants, which may facilitate gene transfer between species. The wood ant Formica exsecta (Formicidae; Hymenoptera), is a common ant species throughout the Palearctic region. The species is a well-established model for studies of ecological characteristics and evolutionary conflict. Results In this study, we sequenced and assembled draft genomes for F. exsecta and its endosymbiont Wolbachia. The F. exsecta draft genome is 277.7 Mb long; we identify 13,767 protein coding genes, for which we provide gene ontology and protein domain annotations. This is also the first report of a Wolbachia genome from ants, and provides insights into the phylogenetic position of this endosymbiont. We also identified multiple horizontal gene transfer events (HGTs) from Wolbachia to F. exsecta. Some of these HGTs have also occurred in parallel in multiple other insect genomes, highlighting the extent of HGTs in eukaryotes. Conclusion We present the first draft genome of ant F. exsecta, and its endosymbiont Wolbachia (wFex), and show considerable rates of gene transfer from the symbiont to the host. We expect that especially the F. exsecta genome will be valuable resource in further exploration of the molecular basis of the evolution of social organization. Electronic supplementary material The online version of this article (10.1186/s12864-019-5665-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kishor Dhaygude
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and environmental sciences, University of Helsinki, P.O. Box 65, FI-00014, Helsinki, Finland.
| | - Abhilash Nair
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and environmental sciences, University of Helsinki, P.O. Box 65, FI-00014, Helsinki, Finland
| | - Helena Johansson
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and environmental sciences, University of Helsinki, P.O. Box 65, FI-00014, Helsinki, Finland
| | - Yannick Wurm
- Organismal Biology Department, School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London, E1 4NS, UK
| | - Liselotte Sundström
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and environmental sciences, University of Helsinki, P.O. Box 65, FI-00014, Helsinki, Finland.,Tvärminne Zoological Station, University of Helsinki, J.A. Palménin tie 260, FI-10900, Hanko, Finland
| |
Collapse
|
23
|
Stroehlein AJ, Young ND, Gasser RB. Improved strategy for the curation and classification of kinases, with broad applicability to other eukaryotic protein groups. Sci Rep 2018; 8:6808. [PMID: 29717207 PMCID: PMC5931623 DOI: 10.1038/s41598-018-25020-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/12/2018] [Indexed: 12/20/2022] Open
Abstract
Despite the substantial amount of genomic and transcriptomic data available for a wide range of eukaryotic organisms, most genomes are still in a draft state and can have inaccurate gene predictions. To gain a sound understanding of the biology of an organism, it is crucial that inferred protein sequences are accurately identified and annotated. However, this can be challenging to achieve, particularly for organisms such as parasitic worms (helminths), as most gene prediction approaches do not account for substantial phylogenetic divergence from model organisms, such as Caenorhabditis elegans and Drosophila melanogaster, whose genomes are well-curated. In this paper, we describe a bioinformatic strategy for the curation of gene families and subsequent annotation of encoded proteins. This strategy relies on pairwise gene curation between at least two closely related species using genomic and transcriptomic data sets, and is built on recent work on kinase complements of parasitic worms. Here, we discuss salient technical aspects of this strategy and its implications for the curation of protein families more generally.
Collapse
Affiliation(s)
- Andreas J Stroehlein
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, 3010, Australia.
| | - Neil D Young
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Robin B Gasser
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, 3010, Australia.
| |
Collapse
|
24
|
Hammond SA, Warren RL, Vandervalk BP, Kucuk E, Khan H, Gibb EA, Pandoh P, Kirk H, Zhao Y, Jones M, Mungall AJ, Coope R, Pleasance S, Moore RA, Holt RA, Round JM, Ohora S, Walle BV, Veldhoen N, Helbing CC, Birol I. The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat Commun 2017; 8:1433. [PMID: 29127278 PMCID: PMC5681567 DOI: 10.1038/s41467-017-01316-7] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 09/07/2017] [Indexed: 12/16/2022] Open
Abstract
Frogs play important ecological roles, and several species are important model organisms for scientific research. The globally distributed Ranidae (true frogs) are the largest frog family, and have substantial evolutionary distance from the model laboratory Xenopus frog species. Unfortunately, there are currently no genomic resources for the former, important group of amphibians. More widely applicable amphibian genomic data is urgently needed as more than two-thirds of known species are currently threatened or are undergoing population declines. We report a 5.8 Gbp (NG50 = 69 kbp) genome assembly of a representative North American bullfrog (Rana [Lithobates] catesbeiana). The genome contains over 22,000 predicted protein-coding genes and 6,223 candidate long noncoding RNAs (lncRNAs). RNA-Seq experiments show thyroid hormone causes widespread transcriptional change among protein-coding and putative lncRNA genes. This initial bullfrog draft genome will serve as a key resource with broad utility including amphibian research, developmental biology, and environmental research.
Collapse
Affiliation(s)
- S Austin Hammond
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Benjamin P Vandervalk
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Erdi Kucuk
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Hamza Khan
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Ewan A Gibb
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Pawan Pandoh
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Heather Kirk
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Yongjun Zhao
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Martin Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Andrew J Mungall
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Robin Coope
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Stephen Pleasance
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Richard A Moore
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Robert A Holt
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6
| | - Jessica M Round
- Department of Biochemistry and Microbiology, University of Victoria, Petch Bldg Room 207, 3800 Finnerty Road, Victoria, BC, Canada, V8P 5C2
| | - Sara Ohora
- Department of Biochemistry and Microbiology, University of Victoria, Petch Bldg Room 207, 3800 Finnerty Road, Victoria, BC, Canada, V8P 5C2
| | - Branden V Walle
- Department of Biochemistry and Microbiology, University of Victoria, Petch Bldg Room 207, 3800 Finnerty Road, Victoria, BC, Canada, V8P 5C2
| | - Nik Veldhoen
- Department of Biochemistry and Microbiology, University of Victoria, Petch Bldg Room 207, 3800 Finnerty Road, Victoria, BC, Canada, V8P 5C2
| | - Caren C Helbing
- Department of Biochemistry and Microbiology, University of Victoria, Petch Bldg Room 207, 3800 Finnerty Road, Victoria, BC, Canada, V8P 5C2.
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 West 7th Ave - Suite 100, Vancouver, BC, Canada, V5Z 4S6.
| |
Collapse
|
25
|
Singh A, Mishra A, Khosravi A, Khandelwal G, Jayaram B. Physico-chemical fingerprinting of RNA genes. Nucleic Acids Res 2017; 45:e47. [PMID: 27932456 PMCID: PMC5397174 DOI: 10.1093/nar/gkw1236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 11/29/2016] [Indexed: 12/13/2022] Open
Abstract
We advance here a novel concept for characterizing different classes of RNA genes on the basis of physico-chemical properties of DNA sequences. As knowledge-based approaches could yield unsatisfactory outcomes due to limitations of training on available experimental data sets, alternative approaches that utilize properties intrinsic to DNA are needed to supplement training based methods and to eventually provide molecular insights into genome organization. Based on a comprehensive series of molecular dynamics simulations of Ascona B-DNA consortium, we extracted hydrogen bonding, stacking and solvation energies of all combinations of DNA sequences at the dinucleotide level and calculated these properties for different types of RNA genes. Considering ∼7.3 million mRNA, 255 524 tRNA, 40 649 rRNA (different subunits) and 5250 miRNA, 3747 snRNA, gene sequences from 9282 complete genome chromosomes of all prokaryotes and eukaryotes available at NCBI, we observed that physico-chemical properties of different functional units on genomic DNA differ in their signatures.
Collapse
Affiliation(s)
- Ankita Singh
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Ali Khosravi
- Ale-Taha Institute of Higher Education, Tehran, Iran
| | - Garima Khandelwal
- Cancer Research UK Manchester Institute, The University of Manchester, Wilmslow Road, Manchester M20 4BX, UK
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| |
Collapse
|
26
|
Robledo D, Hermida M, Rubiolo JA, Fernández C, Blanco A, Bouza C, Martínez P. Integrating genomic resources of flatfish (Pleuronectiformes) to boost aquaculture production. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2016; 21:41-55. [PMID: 28063346 DOI: 10.1016/j.cbd.2016.12.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 12/09/2016] [Accepted: 12/13/2016] [Indexed: 12/15/2022]
Abstract
Flatfish have a high market acceptance thus representing a profitable aquaculture production. The main farmed species is the turbot (Scophthalmus maximus) followed by Japanese flounder (Paralichthys olivaceous) and tongue sole (Cynoglossus semilaevis), but other species like Atlantic halibut (Hippoglossus hippoglossus), Senegalese sole (Solea senegalensis) and common sole (Solea solea) also register an important production and are very promising for farming. Important genomic resources are available for most of these species including whole genome sequencing projects, genetic maps and transcriptomes. In this work, we integrate all available genomic information of these species within a common framework, taking as reference the whole assembled genomes of turbot and tongue sole (>210× coverage). New insights related to the genetic basis of productive traits and new data useful to understand the evolutionary origin and diversification of this group were obtained. Despite a general 1:1 chromosome syntenic relationship between species, the comparison of turbot and tongue sole genomes showed huge intrachromosomic reorganizations. The integration of available mapping information supported specific chromosome fusions along flatfish evolution and facilitated the comparison between species of previously reported genetic associations for productive traits. When comparing transcriptomic resources of the six species, a common set of ~2500 othologues and ~150 common miRNAs were identified, and specific sets of putative missing genes were detected in flatfish transcriptomes, likely reflecting their evolutionary diversification.
Collapse
Affiliation(s)
- Diego Robledo
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Biology (CIBUS), Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Miguel Hermida
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Veterinary, Universidade de Santiago de Compostela, 27002 Lugo, Spain
| | - Juan A Rubiolo
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Veterinary, Universidade de Santiago de Compostela, 27002 Lugo, Spain
| | - Carlos Fernández
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Veterinary, Universidade de Santiago de Compostela, 27002 Lugo, Spain
| | - Andrés Blanco
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Veterinary, Universidade de Santiago de Compostela, 27002 Lugo, Spain
| | - Carmen Bouza
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Veterinary, Universidade de Santiago de Compostela, 27002 Lugo, Spain
| | - Paulino Martínez
- Department of Zoology, Genetics and Physical Anthropology, Faculty of Veterinary, Universidade de Santiago de Compostela, 27002 Lugo, Spain.
| |
Collapse
|