1
|
Schell T, Greve C, Podsiadlowski L. Establishing genome sequencing and assembly for non-model and emerging model organisms: a brief guide. Front Zool 2025; 22:7. [PMID: 40247279 PMCID: PMC12004614 DOI: 10.1186/s12983-025-00561-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 03/23/2025] [Indexed: 04/19/2025] Open
Abstract
Reference genome assemblies are the basis for comprehensive genomic analyses and comparisons. Due to declining sequencing costs and growing computational power, genome projects are now feasible in smaller labs. De novo genome sequencing for non-model or emerging model organisms requires knowledge about genome size and techniques for extracting high molecular weight DNA. Next to quality, the amount of DNA obtained from single individuals is crucial, especially, when dealing with small organisms. While long-read sequencing technologies are the methods of choice for creating high quality genome assemblies, pure short-read assemblies might bear most of the coding parts of a genome but are usually much more fragmented and do not well resolve repeat elements or structural variants. Several genome initiatives produce more and more non-model organism genomes and provide rules for standards in genome sequencing and assembly. However, sometimes the organism of choice is not part of such an initiative or does not meet its standards. Therefore, if the scientific question can be answered with a genome of low contiguity in intergenic parts, missing the high standards of chromosome scale assembly should not prevent publication. This review describes how to set up an animal genome sequencing project in the lab, how to estimate costs and resources, and how to deal with suboptimal conditions. Thus, we aim to suggest optimal strategies for genome sequencing that fulfil the needs according to specific research questions, e.g. "How are species related to each other based on whole genomes?" (phylogenomics), "How do genomes of populations within a species differ?" (population genomics), "Are differences between populations relevant for conservation?" (conservation genomics), "Which selection pressure is acting on certain genes?" (identification of genes under selection), "Did repeats expand or contract recently?" (repeat dynamics).
Collapse
Affiliation(s)
- Tilman Schell
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany
| | - Carola Greve
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany
| | - Lars Podsiadlowski
- LIB, Museum Koenig Bonn, Centre for Molecular Biodiversity Research (zmb), Adenauerallee 127, 53113, Bonn, Germany.
| |
Collapse
|
2
|
Jain A, Li T, Huston DC, Kaur J, Trollip C, Wainer J, Hodda M, Linsell K, Riley IT, Toktay H, Olowu EA, Edwards J, Rodoni B, Sawbridge T. Insights from draft genomes of Heterodera species isolated from field soil samples. BMC Genomics 2025; 26:158. [PMID: 39966714 PMCID: PMC11834393 DOI: 10.1186/s12864-025-11351-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 02/11/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND The nematode phylum includes many species key to soil food webs with trophic behaviours extending from feeding on microbes to macrofauna and plant roots. Among these, the plant parasitic cyst nematodes retain their eggs in protective cysts prolonging their survival under harsh conditions. These nematodes, including those from the genus Heterodera, cause significant economic losses in agricultural systems. Understanding of nematode diversity and ecology has expanded through application of genomic research, however, for Heterodera species there are very few available whole genome sequences. Sequencing and assembling Heterodera genomes is challenging due to various technical limitations imposed by the biology of Heterodera. Overcoming these limitations is essential for comprehensive insights into Heterodera parasitic interactions with plants, population studies, and for Australian biosecurity implications. RESULTS We hereby present draft genomes of six species of which Heterodera australis, H. humuli, H. mani and H. trifolii are presently recorded in Australia and two species, H. avenae and H. filipjevi, currently absent from Australia. The draft genomes were sequenced from genomic DNA isolated from 50 cysts each using an Illumina NovaSeq short read sequencing platform. The data revealed disparity in sequencing yield between species. What was previously identified as H. avenae in Australia using morphological traits is now confirmed as H. australis and may have consequences for wheat breeding programs in Australia that are breeding for resistance to H. avenae. A multigene phylogeny placed the sequenced species into taxonomic phylogenetic perspective. Genomic comparisons within the Avenae species group revealed orthologous gene clusters within the species, emphasising the shared and unique features of the group. The data also revealed the presence of a Wolbachia species, a putative bacterial endosymbiont from Heterodera humuli short read sequencing data. CONCLUSION Genomic research holds immense significance for agriculture, for understanding pest species diversity and the development of effective management strategies. This study provides insight into Heterodera, cyst nematode genomics and the associated symbionts and this work will serve as a baseline for further genomic analyses in this economically important nematode group.
Collapse
Affiliation(s)
- Akshita Jain
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia.
| | - Tongda Li
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia
| | - Daniel C Huston
- Australian National Insect Collection, National Research Collection Australia, CSIRO, PO Box 1700, Canberra, ACT, 2601, Australia
| | - Jatinder Kaur
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia
| | - Conrad Trollip
- Forest Science, NSW Department of Primary Industries, Parramatta, NSW, 2150, Australia
| | - John Wainer
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia
| | - Mike Hodda
- Australian National Insect Collection, National Research Collection Australia, CSIRO, PO Box 1700, Canberra, ACT, 2601, Australia
| | - Katherine Linsell
- South Australian Research and Development Institute, Adelaide, SA, 5064, Australia
| | - Ian T Riley
- School of Agriculture, Food and Wine, The University of Adelaide, PMB 1, Glen Osmond, SA, 5064, Australia
| | - Halil Toktay
- Department of Plant Production and Technologies, Faculty of Agricultural Science and Technologies, Niğde Ömer Halisdemir University, Niğde, Turkey
| | - Eniola Ajibola Olowu
- Department of Plant Production and Technologies, Faculty of Agricultural Science and Technologies, Niğde Ömer Halisdemir University, Niğde, Turkey
| | - Jacqueline Edwards
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia
| | - Brendan Rodoni
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia
| | - Timothy Sawbridge
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
- Centre for AgriBioscience, Agriculture Victoria Research, Department of Energy, Environment and Climate Action (DEECA), Bundoora, VIC, 3083, Australia
| |
Collapse
|
3
|
Krabberød AK, Stokke E, Thoen E, Skrede I, Kauserud H. The Ribosomal Operon Database: A Full-Length rDNA Operon Database Derived From Genome Assemblies. Mol Ecol Resour 2025; 25:e14031. [PMID: 39428982 DOI: 10.1111/1755-0998.14031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 06/27/2024] [Accepted: 09/27/2024] [Indexed: 10/22/2024]
Abstract
Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.
Collapse
Affiliation(s)
- Anders K Krabberød
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Embla Stokke
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Ella Thoen
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Inger Skrede
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| | - Håvard Kauserud
- Department of Biosciences, Section for Genetics and Evolutionary Biology, University of Oslo, Oslo, Norway
| |
Collapse
|
4
|
Zhang X, Chen H, Ni Y, Wu B, Li J, Burzyński A, Liu C. Plant mitochondrial genome map (PMGmap): A software tool for the comprehensive visualization of coding, noncoding and genome features of plant mitochondrial genomes. Mol Ecol Resour 2024; 24:e13952. [PMID: 38523350 DOI: 10.1111/1755-0998.13952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/18/2023] [Accepted: 03/08/2024] [Indexed: 03/26/2024]
Abstract
Tools for visualizing genomes are essential for investigating genomic features and their interactions. Currently, tools designed originally for animal mitogenomes and plant plastomes are used to visualize the mitogens of plants but cannot accurately display features specific to plant mitogenomes, such as nonlinear exon arrangement for genes, the prevalence of functional noncoding features and complex chromosomal architecture. To address these problems, a software package, plant mitochondrial genome map (PMGmap), was developed using the Python programming language. PMGmap can draw genes at exon levels; draw cis- and trans-splicing gene maps, noncoding features and repetitive sequences; and scale genic regions by using the scaling of the genic regions on the mitogenome (SAGM) algorithm. It can also draw multiple chromosomes simultaneously. Compared with other state-of-the-art tools, PMGmap showed better performance in visualizing 405 plant mitogenomes, showing potential as an invaluable tool for plant mitogenome research. The web and container versions and the source code of PMGmap can be accessed through the following link: http://www.1kmpg.cn/pmgmap.
Collapse
Affiliation(s)
- Xinyi Zhang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Haimei Chen
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yang Ni
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Bin Wu
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jingling Li
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Artur Burzyński
- Institute of Oceanology, Polish Academy of Sciences, Sopot, Poland
| | - Chang Liu
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
5
|
Grant AR, Johnson KP, Stanley EL, Baldwin-Brown J, Kolenčík S, Allen JM. Rapid Targeted Assembly of the Proteome Reveals Evolutionary Variation of GC Content in Avian Lice. Bioinform Biol Insights 2024; 18:11779322241257991. [PMID: 38860163 PMCID: PMC11163934 DOI: 10.1177/11779322241257991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 05/02/2024] [Indexed: 06/12/2024] Open
Abstract
Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.
Collapse
Affiliation(s)
- Avery R Grant
- Department of Biology, University of Nevada, Reno, Reno, NV, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Edward L Stanley
- Department of Natural History, Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | | | - Stanislav Kolenčík
- Faculty of Mathematics, Natural Sciences, and Information Technologies, University of Primorska, Koper, Slovenia
| | - Julie M Allen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
6
|
Schiebelhut LM, Guillaume AS, Kuhn A, Schweizer RM, Armstrong EE, Beaumont MA, Byrne M, Cosart T, Hand BK, Howard L, Mussmann SM, Narum SR, Rasteiro R, Rivera-Colón AG, Saarman N, Sethuraman A, Taylor HR, Thomas GWC, Wellenreuther M, Luikart G. Genomics and conservation: Guidance from training to analyses and applications. Mol Ecol Resour 2024; 24:e13893. [PMID: 37966259 DOI: 10.1111/1755-0998.13893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/16/2023]
Abstract
Environmental change is intensifying the biodiversity crisis and threatening species across the tree of life. Conservation genomics can help inform conservation actions and slow biodiversity loss. However, more training, appropriate use of novel genomic methods and communication with managers are needed. Here, we review practical guidance to improve applied conservation genomics. We share insights aimed at ensuring effectiveness of conservation actions around three themes: (1) improving pedagogy and training in conservation genomics including for online global audiences, (2) conducting rigorous population genomic analyses properly considering theory, marker types and data interpretation and (3) facilitating communication and collaboration between managers and researchers. We aim to update students and professionals and expand their conservation toolkit with genomic principles and recent approaches for conserving and managing biodiversity. The biodiversity crisis is a global problem and, as such, requires international involvement, training, collaboration and frequent reviews of the literature and workshops as we do here.
Collapse
Affiliation(s)
- Lauren M Schiebelhut
- Life and Environmental Sciences, University of California, Merced, California, USA
| | - Annie S Guillaume
- Geospatial Molecular Epidemiology group (GEOME), Laboratory for Biological Geochemistry (LGB), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Arianna Kuhn
- Department of Biological Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
- Virginia Museum of Natural History, Martinsville, Virginia, USA
| | - Rena M Schweizer
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | | | - Mark A Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Margaret Byrne
- Department of Biodiversity, Conservation and Attractions, Biodiversity and Conservation Science, Perth, Western Australia, Australia
| | - Ted Cosart
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Brian K Hand
- Flathead Lake Biological Station, University of Montana, Polson, Montana, USA
| | - Leif Howard
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Steven M Mussmann
- Southwestern Native Aquatic Resources and Recovery Center, U.S. Fish & Wildlife Service, Dexter, New Mexico, USA
| | - Shawn R Narum
- Hagerman Genetics Lab, University of Idaho, Hagerman, Idaho, USA
| | - Rita Rasteiro
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Angel G Rivera-Colón
- Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
| | - Norah Saarman
- Department of Biology and Ecology Center, Utah State University, Logan, Utah, USA
| | - Arun Sethuraman
- Department of Biology, San Diego State University, San Diego, California, USA
| | - Helen R Taylor
- Royal Zoological Society of Scotland, Edinburgh, Scotland
| | - Gregg W C Thomas
- Informatics Group, Harvard University, Cambridge, Massachusetts, USA
| | - Maren Wellenreuther
- Plant and Food Research, Nelson, New Zealand
- University of Auckland, Auckland, New Zealand
| | - Gordon Luikart
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| |
Collapse
|
7
|
Ousmael K, Whetten RW, Xu J, Nielsen UB, Lamour K, Hansen OK. Identification and high-throughput genotyping of single nucleotide polymorphism markers in a non-model conifer (Abies nordmanniana (Steven) Spach). Sci Rep 2023; 13:22488. [PMID: 38110478 PMCID: PMC10728141 DOI: 10.1038/s41598-023-49462-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/08/2023] [Indexed: 12/20/2023] Open
Abstract
Single nucleotide polymorphism (SNP) markers are powerful tools for investigating population structures, linkage analysis, and genome-wide association studies, as well as for breeding and population management. The availability of SNP markers has been limited to the most commercially important timber species, primarily due to the cost of genome sequencing required for SNP discovery. In this study, a combination of reference-based and reference-free approaches were used to identify SNPs in Nordmann fir (Abies nordmanniana), a species previously lacking genomic sequence information. Using a combination of a genome assembly of the closely related Silver fir (Abies alba) species and a de novo assembly of low-copy regions of the Nordmann fir genome, we identified a high density of reliable SNPs. Reference-based approaches identified two million SNPs in common between the Silver fir genome and low-copy regions of Nordmann fir. A combination of one reference-free and two reference-based approaches identified 250 shared SNPs. A subset of 200 SNPs were used to genotype 342 individuals and thereby tested and validated in the context of identity analysis and/or clone identification. The tested SNPs successfully identified all ramets per clone and five mislabeled individuals via identity and genomic relatedness analysis. The identified SNPs will be used in ad hoc breeding of Nordmann fir in Denmark.
Collapse
Affiliation(s)
- Kedra Ousmael
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark.
| | - Ross W Whetten
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27606, USA
| | - Jing Xu
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark
| | - Ulrik B Nielsen
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark
| | - Kurt Lamour
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Ole K Hansen
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark
| |
Collapse
|
8
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
9
|
Narh Mensah DL, Wingfield BD, Coetzee MP. A practical approach to genome assembly and annotation of Basidiomycota using the example of Armillaria. Biotechniques 2023; 75:115-128. [PMID: 37681497 DOI: 10.2144/btn-2023-0023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023] Open
Abstract
Technological advancements in genome sequencing, assembly and annotation platforms and algorithms that resulted in several genomic studies have created an opportunity to further our understanding of the biology of phytopathogens, including Armillaria species. Most Armillaria species are facultative necrotrophs that cause root- and stem-rot, usually on woody plants, significantly impacting agriculture and forestry worldwide. Genome sequencing, assembly and annotation in terms of samples used and methods applied in Armillaria genome projects are evaluated in this review. Infographic guidelines and a database of resources to facilitate future Armillaria genome projects were developed. Knowledge gained from genomic studies of Armillaria species is summarized and prospects for further research are provided. This guide can be applied to other diploid and dikaryotic fungal genomics.
Collapse
Affiliation(s)
- Deborah L Narh Mensah
- Department of Biochemistry, Genetics & Microbiology, Forestry & Agricultural Biotechnology Institute (FABI), Faculty of Natural & Agricultural Sciences, University of Pretoria, Pretoria, Gauteng, South Africa
- Council for Scientific and Industrial Research - Food Research Institute (CSIR-FRI), PO Box M20, Accra, Ghana
| | - Brenda D Wingfield
- Department of Biochemistry, Genetics & Microbiology, Forestry & Agricultural Biotechnology Institute (FABI), Faculty of Natural & Agricultural Sciences, University of Pretoria, Pretoria, Gauteng, South Africa
| | - Martin Pa Coetzee
- Department of Biochemistry, Genetics & Microbiology, Forestry & Agricultural Biotechnology Institute (FABI), Faculty of Natural & Agricultural Sciences, University of Pretoria, Pretoria, Gauteng, South Africa
| |
Collapse
|
10
|
Tan MCY, Zakaria MR, Liew KJ, Chong CS. Draft genome sequence of Hahella sp. CR1 and its ability in producing cellulases for saccharifying agricultural biomass. Arch Microbiol 2023; 205:278. [PMID: 37420023 DOI: 10.1007/s00203-023-03617-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/20/2023] [Accepted: 06/23/2023] [Indexed: 07/09/2023]
Abstract
Hahella is a genus that has not been well-studied, with only two identified species. The potential of this genus to produce cellulases is yet to be fully explored. The present study isolated Hahella sp. CR1 from mangrove soil in Tanjung Piai National Park, Malaysia, and performed whole genome sequencing (WGS) using NovaSeq 6000. The final assembled genome consists of 62 contigs, 7,106,771 bp, a GC ratio of 53.5%, and encoded for 6,397 genes. The CR1 strain exhibited the highest similarity with Hahella sp. HN01 compared to other available genomes, where the ANI, dDDH, AAI, and POCP were 97.04%, 75.2%, 97.95%, and 91.0%, respectively. In addition, the CAZymes analysis identified 88 GTs, 54 GHs, 11 CEs, 7 AAs, 2 PLs, and 48 CBMs in the genome of strain CR1. Among these proteins, 11 are related to cellulose degradation. The cellulases produced from strain CR1 were characterized and demonstrated optimal activity at 60 ℃, pH 7.0, and 15% (w/v) sodium chloride. The enzyme was activated by K+, Fe2+, Mg2+, Co2+, and Tween 40. Furthermore, cellulases from strain CR1 improved the saccharification efficiency of a commercial cellulase blend on the tested agricultural wastes, including empty fruit bunch, coconut husk, and sugarcane bagasse. This study provides new insights into the cellulases produced by strain CR1 and their potential to be used in lignocellulosic biomass pre-treatment.
Collapse
Affiliation(s)
- Melvin Chun Yun Tan
- Department of Biosciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
| | - Muhammad Ramziuddin Zakaria
- Department of Biosciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
| | - Kok Jun Liew
- Department of Biosciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
| | - Chun Shiong Chong
- Department of Biosciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia.
| |
Collapse
|
11
|
Tympakianakis S, Trantas E, Avramidou EV, Ververidis F. Vitis vinifera genotyping toolbox to highlight diversity and germplasm identification. FRONTIERS IN PLANT SCIENCE 2023; 14:1139647. [PMID: 37180393 PMCID: PMC10169827 DOI: 10.3389/fpls.2023.1139647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/27/2023] [Indexed: 05/16/2023]
Abstract
The contribution of vine cultivation to human welfare as well as the stimulation of basic social and cultural features of civilization has been great. The wide temporal and regional distribution created a wide array of genetic variants that have been used as propagating material to promote cultivation. Information on the origin and relationships among cultivars is of great interest from a phylogenetics and biotechnology perspective. Fingerprinting and exploration of the complicated genetic background of varieties may contribute to future breeding programs. In this review, we present the most frequently used molecular markers, which have been used on Vitis germplasm. We discuss the scientific progress that led to the new strategies being implemented utilizing state-of-the-art next generation sequencing technologies. Additionally, we attempted to delimit the discussion on the algorithms used in phylogenetic analyses and differentiation of grape varieties. Lastly, the contribution of epigenetics is highlighted to tackle future roadmaps for breeding and exploitation of Vitis germplasm. The latter will remain in the top of the edge for future breeding and cultivation and the molecular tools presented herein, will serve as a reference point in the challenging years to come.
Collapse
Affiliation(s)
- Stylianos Tympakianakis
- Laboratory of Biological and Biotechnological Applications, Department of Agriculture, School of Agricultural Sciences, Hellenic Mediterranean University, Heraklion, Greece
| | - Emmanouil Trantas
- Laboratory of Biological and Biotechnological Applications, Department of Agriculture, School of Agricultural Sciences, Hellenic Mediterranean University, Heraklion, Greece
- Institute of Agri-Food and Life Sciences, Research Center of the Hellenic Mediterranean University, Heraklion, Greece
| | - Evangelia V. Avramidou
- Institute of Mediterranean Forest Ecosystems, Hellenic Agricultural Organisation “DIMITRA“, Athens, Greece
| | - Filippos Ververidis
- Laboratory of Biological and Biotechnological Applications, Department of Agriculture, School of Agricultural Sciences, Hellenic Mediterranean University, Heraklion, Greece
- Institute of Agri-Food and Life Sciences, Research Center of the Hellenic Mediterranean University, Heraklion, Greece
| |
Collapse
|
12
|
Cristina Diaconu C, Madalina Pitica I, Chivu-Economescu M, Georgiana Necula L, Botezatu A, Virginia Iancu I, Iulia Neagu A, L. Radu E, Matei L, Maria Ruta S, Bleotu C. SARS-CoV-2 Variant Surveillance in Genomic Medicine Era. Infect Dis (Lond) 2023. [DOI: 10.5772/intechopen.107137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/26/2024] Open
Abstract
In the genomic medicine era, the emergence of SARS-CoV-2 was immediately followed by viral genome sequencing and world-wide sequences sharing. Almost in real-time, based on these sequences, resources were developed and applied around the world, such as molecular diagnostic tests, informed public health decisions, and vaccines. Molecular SARS-CoV-2 variant surveillance was a normal approach in this context yet, considering that the viral genome modification occurs commonly in viral replication process, the challenge is to identify the modifications that significantly affect virulence, transmissibility, reduced effectiveness of vaccines and therapeutics or failure of diagnostic tests. However, assessing the importance of the emergence of new mutations and linking them to epidemiological trend, is still a laborious process and faster phenotypic evaluation approaches, in conjunction with genomic data, are required in order to release timely and efficient control measures.
Collapse
|
13
|
Raiyemo DA, Bobadilla LK, Tranel PJ. Genomic profiling of dioecious Amaranthus species provides novel insights into species relatedness and sex genes. BMC Biol 2023; 21:37. [PMID: 36804015 PMCID: PMC9940365 DOI: 10.1186/s12915-023-01539-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 02/08/2023] [Indexed: 02/21/2023] Open
Abstract
BACKGROUND Amaranthus L. is a diverse genus consisting of domesticated, weedy, and non-invasive species distributed around the world. Nine species are dioecious, of which Amaranthus palmeri S. Watson and Amaranthus tuberculatus (Moq.) J.D. Sauer are troublesome weeds of agronomic crops in the USA and elsewhere. Shallow relationships among the dioecious Amaranthus species and the conservation of candidate genes within previously identified A. palmeri and A. tuberculatus male-specific regions of the Y (MSYs) in other dioecious species are poorly understood. In this study, seven genomes of dioecious amaranths were obtained by paired-end short-read sequencing and combined with short reads of seventeen species in the family Amaranthaceae from NCBI database. The species were phylogenomically analyzed to understand their relatedness. Genome characteristics for the dioecious species were evaluated and coverage analysis was used to investigate the conservation of sequences within the MSY regions. RESULTS We provide genome size, heterozygosity, and ploidy level inference for seven newly sequenced dioecious Amaranthus species and two additional dioecious species from the NCBI database. We report a pattern of transposable element proliferation in the species, in which seven species had more Ty3 elements than copia elements while A. palmeri and A. watsonii had more copia elements than Ty3 elements, similar to the TE pattern in some monoecious amaranths. Using a Mash-based phylogenomic analysis, we accurately recovered taxonomic relationships among the dioecious Amaranthus species that were previously identified based on comparative morphology. Coverage analysis revealed eleven candidate gene models within the A. palmeri MSY region with male-enriched coverages, as well as regions on scaffold 19 with female-enriched coverage, based on A. watsonii read alignments. A previously reported FLOWERING LOCUS T (FT) within A. tuberculatus MSY contig was also found to exhibit male-enriched coverages for three species closely related to A. tuberculatus but not for A. watsonii reads. Additional characterization of the A. palmeri MSY region revealed that 78% of the region is made of repetitive elements, typical of a sex determination region with reduced recombination. CONCLUSIONS The results of this study further increase our understanding of the relationships among the dioecious species of the Amaranthus genus as well as revealed genes with potential roles in sex function in the species.
Collapse
Affiliation(s)
- Damilola A Raiyemo
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Lucas K Bobadilla
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Patrick J Tranel
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
14
|
González-Plaza JJ, Furlan C, Rijavec T, Lapanje A, Barros R, Tamayo-Ramos JA, Suarez-Diez M. Advances in experimental and computational methodologies for the study of microbial-surface interactions at different omics levels. Front Microbiol 2022; 13:1006946. [PMID: 36519168 PMCID: PMC9744117 DOI: 10.3389/fmicb.2022.1006946] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/02/2022] [Indexed: 08/31/2023] Open
Abstract
The study of the biological response of microbial cells interacting with natural and synthetic interfaces has acquired a new dimension with the development and constant progress of advanced omics technologies. New methods allow the isolation and analysis of nucleic acids, proteins and metabolites from complex samples, of interest in diverse research areas, such as materials sciences, biomedical sciences, forensic sciences, biotechnology and archeology, among others. The study of the bacterial recognition and response to surface contact or the diagnosis and evolution of ancient pathogens contained in archeological tissues require, in many cases, the availability of specialized methods and tools. The current review describes advances in in vitro and in silico approaches to tackle existing challenges (e.g., low-quality sample, low amount, presence of inhibitors, chelators, etc.) in the isolation of high-quality samples and in the analysis of microbial cells at genomic, transcriptomic, proteomic and metabolomic levels, when present in complex interfaces. From the experimental point of view, tailored manual and automatized methodologies, commercial and in-house developed protocols, are described. The computational level focuses on the discussion of novel tools and approaches designed to solve associated issues, such as sample contamination, low quality reads, low coverage, etc. Finally, approaches to obtain a systems level understanding of these complex interactions by integrating multi omics datasets are presented.
Collapse
Affiliation(s)
- Juan José González-Plaza
- International Research Centre in Critical Raw Materials-ICCRAM, University of Burgos, Burgos, Spain
| | - Cristina Furlan
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| | - Tomaž Rijavec
- Department of Environmental Sciences, Jožef Stefan Institute, Ljubljana, Slovenia
| | - Aleš Lapanje
- Department of Environmental Sciences, Jožef Stefan Institute, Ljubljana, Slovenia
| | - Rocío Barros
- International Research Centre in Critical Raw Materials-ICCRAM, University of Burgos, Burgos, Spain
| | | | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
15
|
Characterization, Comparison of Two New Mitogenomes of Crocodile Newts Tylototriton (Caudata: Salamandridae), and Phylogenetic Implications. Genes (Basel) 2022; 13:genes13101878. [PMID: 36292763 PMCID: PMC9601590 DOI: 10.3390/genes13101878] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/12/2022] [Accepted: 10/14/2022] [Indexed: 11/17/2022] Open
Abstract
Mitochondrial genomes (mitogenomes) are valuable resources in molecular and evolutionary studies, such as phylogeny and population genetics. The complete mitogenomes of two crocodile newts, Tylototriton broadoridgus and Tylototriton gaowangjienensis, were sequenced, assembled, and annotated for the first time using next-generation sequencing. The complete mitogenomes of T. broadoridgus and T. gaowangjienensis were 16,265 bp and 16,259 bp in lengths, which both composed of 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and 1 control region. The two mitogenomes had high A + T content with positive AT-skew and negative GC-skew patterns. The ratio of non-synonymous and synonymous substitutions showed that, relatively, the ATP8 gene evolved the fastest and COI evolved the slowest among the 13 PCGs. Phylogenetic trees from BI and ML analyses resulted in identical topologies, where the Tylototriton split into two groups corresponding to two subgenera. Both T. broadoridgus and T. gaowangjienensis sequenced here belonged to the subgenus Yaotriton, and these two species shared a tentative sister group relationship. The two mitogenomes reported in this study provided valuable data for future molecular and evolutionary studies of the genus Tylotoriton and other salamanders.
Collapse
|
16
|
Turudić A, Liber Z, Grdiša M, Jakše J, Varga F, Šatović Z. Chloroplast Genome Annotation Tools: Prolegomena to the Identification of Inverted Repeats. Int J Mol Sci 2022; 23:10804. [PMID: 36142721 PMCID: PMC9503105 DOI: 10.3390/ijms231810804] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/01/2022] [Accepted: 09/13/2022] [Indexed: 12/31/2022] Open
Abstract
The development of next-generation sequencing technology and the increasing amount of sequencing data have brought the bioinformatic tools used in genome assembly into focus. The final step of the process is genome annotation, which works on assembled genome sequences to identify the location of genome features. In the case of organelle genomes, specialized annotation tools are used to identify organelle genes and structural features. Numerous annotation tools target chloroplast sequences. Most chloroplast DNA genomes have a quadripartite structure caused by two copies of a large inverted repeat. We investigated the strategies of six annotation tools (Chloë, Chloroplot, GeSeq, ORG.Annotate, PGA, Plann) for identifying inverted repeats and analyzed their success using publicly available complete chloroplast sequences of taxa belonging to the asterid and rosid clades. The annotation tools use two different approaches to identify inverted repeats, using existing general search tools or implementing stand-alone solutions. The chloroplast sequences studied show that there are different types of imperfections in the assembled data and that each tool performs better on some sequences than the others.
Collapse
Affiliation(s)
- Ante Turudić
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| | - Zlatko Liber
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Science, University of Zagreb, Marulićev trg 9a, 10000 Zagreb, Croatia
| | - Martina Grdiša
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| | - Jernej Jakše
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
| | - Filip Varga
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| | - Zlatko Šatović
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| |
Collapse
|
17
|
da Silva Moia G, Sérgio Cruz Gaia A, de Oliveira MS, Dos Santosa VC, Thyeska Castro Alves J, de Sá PHCG, de Oliveira Veras AA. ReNoteWeb - Web platform for the improvement of assembly result and annotation of prokaryotic genomes. Gene 2022; 844:146819. [PMID: 36029977 DOI: 10.1016/j.gene.2022.146819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 07/25/2022] [Accepted: 08/12/2022] [Indexed: 11/04/2022]
Abstract
The reduction in the cost of DNA sequencing and the total time to perform this process has resulted in a significant increase in the deposit of biological information in public databases such as the NCBI (National Center for Biotechnology Information). The production of large volumes of data per run has culminated in the need to develop algorithms capable of handling data with this new feature and assisting in analyses such as the assembly and annotation of prokaryotic genomes. Over the years, several pipelines and computational tools have been developed to automate this task and consequently reduce the total time to know the genetic content of a given organism, especially non-model organisms, collaborating with the identification of possible targets with biotechnological applicability. In the case of automatic annotation tools, the accuracy of the results is widely observed in the literature, however, this does not excludes the manual curation process, where the information inferred in the automatic process is verified and enriched by the curators. This task requires a time which is directly proportional to the number of gene products of the target organism under study. To assist in this process, we present the ReNoteWeb web tool, endowed with a simple and intuitive interface, to perform the assembly enhancement process, with the possibility of identifying the missing products in the original genomic sequence. In addition, ReNoteWeb is capable of performing the annotation process for all products, based on information obtained from highly accurate external databases. The engine responsible for performing the data processing was developed in JAVA and the web platform uses the resources of the Yii framework. The annotation produced by this platform aims to reduce the overall time in the manual curation process. Twenty-three organisms were used to validate the tool. The efficiency was verified by comparing the annotation of these same organisms available in the NCBI database and the annotation performed on the RAST platform. The tool is available at: http://biod.ufpa.br/renoteweb/.
Collapse
Affiliation(s)
- Gislenne da Silva Moia
- Faculty of Computer Engineering, Federal University of Pará campus Tucuruí (CAMTUC-UFPA), Pará, Brazil
| | | | - Mônica Silva de Oliveira
- Nucleus of Amazonian Development in Engineering, Federal University of Pará campus Tucuruí (NDAE-UFPA), Pará, Brazil
| | | | | | | | | |
Collapse
|
18
|
Cuesta-Morrondo S, Redondo C, Palacio-Bielsa A, Garita-Cambronero J, Cubero J. Complete Genome Sequence Resources of Six Strains of the Most Virulent Pathovars of Xanthomonas arboricola Using Long- and Short-Read Sequencing Approaches. PHYTOPATHOLOGY 2022; 112:1808-1813. [PMID: 35522570 DOI: 10.1094/phyto-10-21-0436-a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Affiliation(s)
- Sara Cuesta-Morrondo
- Departamento de Protección Vegetal, Laboratorio Bacteriología, Centro Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA/CSIC), Madrid 28040, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, 28040 Madrid, Spain
| | - Cristina Redondo
- Departamento de Protección Vegetal, Laboratorio Bacteriología, Centro Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA/CSIC), Madrid 28040, Spain
| | - Ana Palacio-Bielsa
- Departamento de Sistemas Agrícolas, Forestales y Medio Ambiente, Centro de Investigación y Tecnología Agroalimentaria de Aragón, Instituto Agroalimentario de Aragón-IA2 (CITA-Universidad de Zaragoza), Avda. Montañana 930, 50059, Zaragoza, Spain
| | | | - Jaime Cubero
- Departamento de Protección Vegetal, Laboratorio Bacteriología, Centro Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA/CSIC), Madrid 28040, Spain
| |
Collapse
|
19
|
Rossi N, Colautti A, Iacumin L, Piazza C. WGA-LP: a pipeline for whole genome assembly of contaminated reads. Bioinformatics 2022; 38:846-848. [PMID: 34668528 DOI: 10.1093/bioinformatics/btab719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/22/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Whole genome assembly (WGA) of bacterial genomes with short reads is a quite common task as DNA sequencing has become cheaper with the advances of its technology. The process of assembling a genome has no absolute golden standard and it requires to perform a sequence of steps each of which can involve combinations of many different tools. However, the quality of the final assembly is always strongly related to the quality of the input data. With this in mind we built WGA-LP, a package that connects state-of-the-art programs for microbial analysis and novel scripts to check and improve the quality of both samples and resulting assemblies. WGA-LP, with its conservative decontamination approach, has shown to be capable of creating high quality assemblies even in the case of contaminated reads. AVAILABILITY AND IMPLEMENTATION WGA-LP is available on GitHub (https://github.com/redsnic/WGA-LP) and Docker Hub (https://hub.docker.com/r/redsnic/wgalp). The web app for node visualization is hosted by shinyapps.io (https://redsnic.shinyapps.io/ContigCoverageVisualizer/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- N Rossi
- Department of Mathematics, Computer Science, and Physics, University of Udine, 33100 Udine, Italy
| | - A Colautti
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, University of Udine, 33100 Udine, Italy
| | - L Iacumin
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, University of Udine, 33100 Udine, Italy
| | - C Piazza
- Department of Mathematics, Computer Science, and Physics, University of Udine, 33100 Udine, Italy
| |
Collapse
|
20
|
Hurgobin B. Annotation of Protein-Coding Genes in Plant Genomes. Methods Mol Biol 2022; 2443:309-326. [PMID: 35037214 DOI: 10.1007/978-1-0716-2067-0_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Advances in next-generation sequencing technologies and the lower sequencing costs are paving the way to more plant genome sequencing, assembly, and annotation projects. While genome assembly is the first step toward elucidating the genome structure of a species, it is the annotation of the protein-coding genes that provide meaningful information to biologists. However, genome annotation is not a trivial task. Therefore, the aim of this chapter is to provide a detailed view of this important process, including tools and commands that can be used to carry out such a process.
Collapse
Affiliation(s)
- Bhavna Hurgobin
- La Trobe Institute for Agriculture and Food, Department of Animal, Plant and Soil Sciences, School of Life Sciences, AgriBio Building, La Trobe University, Bundoora, VIC, Australia.
- Australian Research Council Research Hub for Medicinal Agriculture, AgriBio Building, La Trobe University, Bundoora, VIC, Australia.
| |
Collapse
|
21
|
Vlasova A, Hermoso Pulido T, Camara F, Ponomarenko J, Guigó R. FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow. Genes (Basel) 2021; 12:genes12101645. [PMID: 34681040 PMCID: PMC8535801 DOI: 10.3390/genes12101645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 11/17/2022] Open
Abstract
Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility.
Collapse
Affiliation(s)
- Anna Vlasova
- Barcelona Supercomputing Centre (BSC-CNS), Jordi Girona, 29, 08034 Barcelona, Spain;
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Toni Hermoso Pulido
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain; (F.C.); (J.P.); (R.G.)
- Correspondence:
| | - Francisco Camara
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain; (F.C.); (J.P.); (R.G.)
| | - Julia Ponomarenko
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain; (F.C.); (J.P.); (R.G.)
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain; (F.C.); (J.P.); (R.G.)
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
22
|
Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes. BMC Genomics 2021; 22:733. [PMID: 34627149 PMCID: PMC8501643 DOI: 10.1186/s12864-021-08029-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 09/22/2021] [Indexed: 11/10/2022] Open
Abstract
Background Functional genome annotation is the process of labelling functional genomic regions with descriptive information. Manual curation can produce higher quality genome annotations than fully automated methods. Manual annotation efforts are time-consuming and complex; however, software can help reduce these drawbacks. Results We created Manual Annotation Studio (MAS) to improve the efficiency of the process of manual functional annotation prokaryotic and viral genomes. MAS allows users to upload unannotated genomes, provides an interface to edit and upload annotations, tracks annotation history and progress, and saves data to a relational database. MAS provides users with pertinent information through a simple point and click interface to execute and visualize results for multiple homology search tools (blastp, rpsblast, and HHsearch) against multiple databases (Swiss-Prot, nr, CDD, PDB, and an internally generated database). MAS was designed to accept connections over the local area network (LAN) of a lab or organization so multiple users can access it simultaneously. MAS can take advantage of high-performance computing (HPC) clusters by interfacing with SGE or SLURM and data can be exported from MAS in a variety of formats (FASTA, GenBank, GFF, and excel). Conclusions MAS streamlines and provides structure to manual functional annotation projects. MAS enhances the ability of users to generate, interpret, and compare results from multiple tools. The structure that MAS provides can improve project organization and reduce annotation errors. MAS is ideal for team-based annotation projects because it facilitates collaboration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08029-8.
Collapse
|
23
|
Johnson LK, Sahasrabudhe R, Gill JA, Roach JL, Froenicke L, Brown CT, Whitehead A. Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish. Gigascience 2021; 9:5859380. [PMID: 32556169 PMCID: PMC7301629 DOI: 10.1093/gigascience/giaa067] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 04/16/2020] [Accepted: 05/27/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. FINDINGS Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30-45× sequence coverage, and the Illumina platform was used to generate 50-160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. CONCLUSIONS High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.
Collapse
Affiliation(s)
- Lisa K Johnson
- Department of Environmental Toxicology, University of California. 1 Shields Avenue, Davis, CA 95616, Davis, CA, USA
- Department of Population Health & Reproduction, School of Veterinary Medicine, University of California. 1 Shields Avenue, Davis, CA 95616, Davis, CA, USA
| | - Ruta Sahasrabudhe
- DNA Technologies Core, Genome Center, University of California, 1 Shields Avenue, Davis, CA 95616
| | - James Anthony Gill
- Department of Environmental Toxicology, University of California. 1 Shields Avenue, Davis, CA 95616, Davis, CA, USA
| | - Jennifer L Roach
- Department of Environmental Toxicology, University of California. 1 Shields Avenue, Davis, CA 95616, Davis, CA, USA
| | - Lutz Froenicke
- DNA Technologies Core, Genome Center, University of California, 1 Shields Avenue, Davis, CA 95616
| | - C Titus Brown
- Department of Population Health & Reproduction, School of Veterinary Medicine, University of California. 1 Shields Avenue, Davis, CA 95616, Davis, CA, USA
| | - Andrew Whitehead
- Correspondence address. Andrew Whitehead, Department of Environmental Toxicology, University of California. 1 Shields Avenue, Davis, CA 95616, USA, Davis, CA, USA. E-mail:
| |
Collapse
|
24
|
Tsai H, Kippes N, Firl A, Lieberman M, Comai L, Henry IM. Efficient construction of a linkage map and haplotypes for Mentha suaveolens using sequence capture. G3-GENES GENOMES GENETICS 2021; 11:6321234. [PMID: 34544134 PMCID: PMC8496254 DOI: 10.1093/g3journal/jkab232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 06/25/2021] [Indexed: 11/12/2022]
Abstract
The sustainability of many crops is hindered by the lack of genomic resources and a poor understanding of natural genetic diversity. Particularly, application of modern breeding requires high-density linkage maps that are integrated into a highly contiguous reference genome. Here, we present a rapid method for deriving haplotypes and developing linkage maps, and its application to Mentha suaveolens, one of the diploid progenitors of cultivated mints. Using sequence-capture via DNA hybridization to target single nucleotide polymorphisms (SNPs), we successfully genotyped ∼5000 SNPs within the genome of >400 individuals derived from a self cross. After stringent quality control, and identification of nonredundant SNPs, 1919 informative SNPs were retained for linkage map construction. The resulting linkage map defined a total genetic space of 942.17 cM divided among 12 linkage groups, ranging from 56.32 to 122.61 cM in length. The linkage map is in good agreement with pseudomolecules from our preliminary genome assembly, proving this resource effective for the correction and validation of the reference genome. We discuss the advantages of this method for the rapid creation of linkage maps.
Collapse
Affiliation(s)
- Helen Tsai
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Nestor Kippes
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Alana Firl
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Meric Lieberman
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Luca Comai
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| | - Isabelle M Henry
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA 95616, USA
| |
Collapse
|
25
|
Yao R, Heinrich M, Wei J, Xiao P. Cross-Cultural Ethnobotanical Assembly as a New Tool for Understanding Medicinal and Culinary Values-The Genus Lycium as A Case Study. Front Pharmacol 2021; 12:708518. [PMID: 34335270 PMCID: PMC8322658 DOI: 10.3389/fphar.2021.708518] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 07/07/2021] [Indexed: 11/22/2022] Open
Abstract
Ethnobotanical knowledge is indispensable for the conservation of global biological integrity, and could provide irreplaceable clues for bioprospecting aiming at new food crops and medicines. This biocultural diversity requires a comprehensive documentation of such intellectual knowledge at local levels. However, without systematically capturing the data, those regional records are fragmented and can hardly be used. In this study, we develop a framework to assemble the cross-cultural ethnobotanical knowledge at a genus level, including capturing the species’ diversity and their cultural importance, integrating their traditional uses, and revealing the intercultural relationship of ethnobotanical data quantitatively. Using such a cross-cultural ethnobotanical assembly, the medicinal and culinary values of the genus Lycium are evaluated. Simultaneously, the analysis highlights the problems and options for a systematic cross-cultural ethnobotanical knowledge assembly. The framework used here could generate baseline data relevant for conservation and sustainable use of plant diversity as well as for bioprospecting within targeting taxa.
Collapse
Affiliation(s)
- Ruyu Yao
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Michael Heinrich
- Research Group "Pharmacognosy and Phytotherapy", UCL School of Pharmacy, Univ. London, London, United Kingdom
| | - Jianhe Wei
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Peigen Xiao
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
26
|
Saenko SV, Groenenberg DSJ, Davison A, Schilthuizen M. The draft genome sequence of the grove snail Cepaea nemoralis. G3-GENES GENOMES GENETICS 2021; 11:6080775. [PMID: 33604668 PMCID: PMC8022989 DOI: 10.1093/g3journal/jkaa071] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 12/22/2020] [Indexed: 12/14/2022]
Abstract
Studies on the shell color and banding polymorphism of the grove snail Cepaea nemoralis and the sister taxon Cepaea hortensis have provided compelling evidence for the fundamental role of natural selection in promoting and maintaining intraspecific variation. More recently, Cepaea has been the focus of citizen science projects on shell color evolution in relation to climate change and urbanization. C. nemoralis is particularly useful for studies on the genetics of shell polymorphism and the evolution of "supergenes," as well as evo-devo studies of shell biomineralization, because it is relatively easily maintained in captivity. However, an absence of genomic resources for C. nemoralis has generally hindered detailed genetic and molecular investigations. We therefore generated ∼23× coverage long-read data for the ∼3.5 Gb genome, and produced a draft assembly composed of 28,537 contigs with the N50 length of 333 kb. Genome completeness, estimated by BUSCO using the metazoa dataset, was 91%. Repetitive regions cover over 77% of the genome. A total of 43,519 protein-coding genes were predicted in the assembled genome, and 97.3% of these were functionally annotated from either sequence homology or protein signature searches. This first assembled and annotated genome sequence for a helicoid snail, a large group that includes edible species, agricultural pests, and parasite hosts, will be a core resource for identifying the loci that determine the shell polymorphism, as well as in a wide range of analyses in evolutionary and developmental biology, and snail biology in general.
Collapse
Affiliation(s)
- Suzanne V Saenko
- Evolutionary Ecology, Naturalis Biodiversity Center, Leiden 2333CR, the Netherlands.,Animal Sciences, Institute of Biology Leiden, Leiden University, Leiden 2333BE, the Netherlands
| | - Dick S J Groenenberg
- Evolutionary Ecology, Naturalis Biodiversity Center, Leiden 2333CR, the Netherlands
| | - Angus Davison
- School of Life Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | - Menno Schilthuizen
- Evolutionary Ecology, Naturalis Biodiversity Center, Leiden 2333CR, the Netherlands.,Animal Sciences, Institute of Biology Leiden, Leiden University, Leiden 2333BE, the Netherlands
| |
Collapse
|
27
|
Generalovic TN, McCarthy SA, Warren IA, Wood JMD, Torrance J, Sims Y, Quail M, Howe K, Pipan M, Durbin R, Jiggins CD. A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3 (BETHESDA, MD.) 2021; 11:jkab085. [PMID: 33734373 PMCID: PMC8104945 DOI: 10.1093/g3journal/jkab085] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/09/2021] [Indexed: 01/15/2023]
Abstract
Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.
Collapse
Affiliation(s)
| | - Shane A McCarthy
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Jonathan M D Wood
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - James Torrance
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Ying Sims
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Michael Quail
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miha Pipan
- Better Origin, Entomics Biosystems Limited, Cambridge CB3 0ES, UK
| | - Richard Durbin
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| |
Collapse
|
28
|
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021; 22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.
Collapse
Affiliation(s)
- Sagnik Banerjee
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Statistics, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Margaret Woodhouse
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-Agricultural Research Service, Albany, CA, 94710, USA
| | - Roger P Wise
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
29
|
Rodriguez-Anaya LZ, Félix-Sastré ÁJ, Lares-Villa F, Lares-Jiménez LF, Gonzalez-Galaviz JR. Application of the omics sciences to the study of Naegleria fowleri, Acanthamoeba spp., and Balamuthia mandrillaris: current status and future projections. Parasite 2021; 28:36. [PMID: 33843581 PMCID: PMC8040595 DOI: 10.1051/parasite/2021033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 03/15/2021] [Indexed: 12/12/2022] Open
Abstract
In this review, we focus on the sequenced genomes of the pathogens Naegleria fowleri, Acanthamoeba spp. and Balamuthia mandrillaris, and the remarkable discoveries regarding the pathogenicity and genetic information of these organisms, using techniques related to the various omics branches like genomics, transcriptomics, and proteomics. Currently, novel data produced through comparative genomics analyses and both differential gene and protein expression in these free-living amoebas have allowed for breakthroughs to identify genes unique to N. fowleri, genes with active transcriptional activity, and their differential expression in conditions of modified virulence. Furthermore, orthologous genes of the various nuclear genomes within the Naegleria and Acanthamoeba genera have been clustered. The proteome of B. mandrillaris has been reconstructed through transcriptome data, and its mitochondrial genome structure has been thoroughly described with a unique characteristic that has come to light: a type I intron with the capacity of interrupting genes through its self-splicing ribozymes activity. With the integration of data derived from the diverse omic sciences, there is a potential approximation that reflects the molecular complexity required for the identification of virulence factors, as well as crucial information regarding the comprehension of the molecular mechanisms with which these interact. Altogether, these breakthroughs could contribute to radical advances in both the fields of therapy design and medical diagnosis in the foreseeable future.
Collapse
Affiliation(s)
| | - Ángel Josué Félix-Sastré
- Departamento de Biotecnología y Ciencias Alimentarias, Instituto Tecnológico de Sonora Ciudad Obregón 85000 Sonora México
| | - Fernando Lares-Villa
- Departamento de Ciencias Agronómicas y Veterinarias, Instituto Tecnológico de Sonora Ciudad Obregón 85000 Sonora México
| | - Luis Fernando Lares-Jiménez
- Departamento de Ciencias Agronómicas y Veterinarias, Instituto Tecnológico de Sonora Ciudad Obregón 85000 Sonora México
| | | |
Collapse
|
30
|
Challenges of automation and scale: Bioinformatics and the evaluation of proteins to support genetically modified product safety assessments. J Invertebr Pathol 2021; 186:107587. [PMID: 33838205 DOI: 10.1016/j.jip.2021.107587] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 03/25/2021] [Accepted: 03/30/2021] [Indexed: 11/24/2022]
Abstract
Bioinformatic analyses of protein sequences play an important role in the discovery and subsequent safety assessment of insect control proteins in Genetically Modified (GM) crops. Due to the rapid adoption of high-throughput sequencing methods over the last decade, the number of protein sequences in GenBank and other public databases has increased dramatically. Many of these protein sequences are the product of whole genome sequencing efforts, coupled with automated protein sequence prediction and annotation pipelines. Published genome sequencing studies provide a rich and expanding foundation of new source organisms and proteins for insect control or other desirable traits in GM products. However, data generated by automated pipelines can also confound regulatory safety assessments that employ bioinformatics. Largely this issue does not arise due to underlying sequence, but rather its annotation or associated metadata, and the downstream integration of that data into existing repositories. Observations made during bioinformatic safety assessments are described.
Collapse
|
31
|
Chiara M, D’Erchia AM, Gissi C, Manzari C, Parisi A, Resta N, Zambelli F, Picardi E, Pavesi G, Horner DS, Pesole G. Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities. Brief Bioinform 2021; 22:616-630. [PMID: 33279989 PMCID: PMC7799330 DOI: 10.1093/bib/bbaa297] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/27/2020] [Accepted: 10/07/2020] [Indexed: 12/31/2022] Open
Abstract
Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our 'vademecum' for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.
Collapse
Affiliation(s)
- Matteo Chiara
- molecular biology and bioinformatics at the University of Milan
| | - Anna Maria D’Erchia
- molecular biology at the University of Bari and research associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Carmela Gissi
- molecular biology at the University of Bari and research associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Caterina Manzari
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Antonio Parisi
- Genetic and Molecular Epidemiology Laboratory at the Experimental Zooprophylactic Institute of Apulia and Basilicata
| | - Nicoletta Resta
- Medical Genetics at the University of Bari. She heads the Laboratory Unit of Medical Genetics and the School of Specialization in Medical Genetics
| | | | - Ernesto Picardi
- molecular biology and bioinformatics at the University of Bari and research associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Giulio Pavesi
- Associate Professor of bioinformatics at the University of Milan (Italy)
| | - David S Horner
- molecular biology and bioinformatics at the University of Milan
| | - Graziano Pesole
- molecular biology at the University of Bari and Research Associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| |
Collapse
|
32
|
Maboko BB, Featherston J, Sibeko-Matjila KP, Mans BJ. Whole genome sequencing of Theileria parva using target capture. Genomics 2020; 113:429-438. [PMID: 33370583 DOI: 10.1016/j.ygeno.2020.12.033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 12/02/2020] [Accepted: 12/22/2020] [Indexed: 10/22/2022]
Abstract
Protozoan parasite isolation and purification are laborious and time-consuming processes required for high quality genomic DNA used in whole genome sequencing. The objective of this study was to capture whole Theileria parva genomes directly from cell cultures and blood samples using RNA baits. Cell culture material was bait captured or sequenced directly, while blood samples were all captured. Baits had variable success in capturing T. parva genomes from blood samples but were successful in cell cultures. Genome mapping uncovered extensive host contamination in blood samples compared to cell cultures. Captured cell cultures had over 81 fold coverage for the reference genome compared to 0-33 fold for blood samples. Results indicate that baits are specific to T. parva, are a good alternative to conventional methods and thus ideal for genomic studies. This study also reports the first whole genome sequencing of South African T. parva.
Collapse
Affiliation(s)
- Boitumelo B Maboko
- Agricultural Research Council, Onderstepoort Veterinary Research, Private Bag X05, Onderstepoort, 0110 Pretoria, South Africa; Department of Veterinary Tropical Diseases, Vector and Vector-borne Disease Research Programme, University of Pretoria, Private Bag X04, Onderstepoort, 0110 Pretoria, South Africa
| | - Jonathan Featherston
- Agricultural Research Council, Biotechnology Platform, Private Bag X05, Onderstepoort, 0110 Pretoria, South Africa
| | - Kgomotso P Sibeko-Matjila
- Department of Veterinary Tropical Diseases, Vector and Vector-borne Disease Research Programme, University of Pretoria, Private Bag X04, Onderstepoort, 0110 Pretoria, South Africa
| | - Ben J Mans
- Agricultural Research Council, Onderstepoort Veterinary Research, Private Bag X05, Onderstepoort, 0110 Pretoria, South Africa; Department of Veterinary Tropical Diseases, Vector and Vector-borne Disease Research Programme, University of Pretoria, Private Bag X04, Onderstepoort, 0110 Pretoria, South Africa; School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Durban 4000, South Africa; Department of Life and Consumer Sciences, University of South Africa, Florida 1709, South Africa.
| |
Collapse
|
33
|
Muggia L, Ametrano CG, Sterflinger K, Tesei D. An Overview of Genomics, Phylogenomics and Proteomics Approaches in Ascomycota. Life (Basel) 2020; 10:E356. [PMID: 33348904 PMCID: PMC7765829 DOI: 10.3390/life10120356] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/10/2020] [Accepted: 12/12/2020] [Indexed: 12/26/2022] Open
Abstract
Fungi are among the most successful eukaryotes on Earth: they have evolved strategies to survive in the most diverse environments and stressful conditions and have been selected and exploited for multiple aims by humans. The characteristic features intrinsic of Fungi have required evolutionary changes and adaptations at deep molecular levels. Omics approaches, nowadays including genomics, metagenomics, phylogenomics, transcriptomics, metabolomics, and proteomics have enormously advanced the way to understand fungal diversity at diverse taxonomic levels, under changeable conditions and in still under-investigated environments. These approaches can be applied both on environmental communities and on individual organisms, either in nature or in axenic culture and have led the traditional morphology-based fungal systematic to increasingly implement molecular-based approaches. The advent of next-generation sequencing technologies was key to boost advances in fungal genomics and proteomics research. Much effort has also been directed towards the development of methodologies for optimal genomic DNA and protein extraction and separation. To date, the amount of proteomics investigations in Ascomycetes exceeds those carried out in any other fungal group. This is primarily due to the preponderance of their involvement in plant and animal diseases and multiple industrial applications, and therefore the need to understand the biological basis of the infectious process to develop mechanisms for biologic control, as well as to detect key proteins with roles in stress survival. Here we chose to present an overview as much comprehensive as possible of the major advances, mainly of the past decade, in the fields of genomics (including phylogenomics) and proteomics of Ascomycota, focusing particularly on those reporting on opportunistic pathogenic, extremophilic, polyextremotolerant and lichenized fungi. We also present a review of the mostly used genome sequencing technologies and methods for DNA sequence and protein analyses applied so far for fungi.
Collapse
Affiliation(s)
- Lucia Muggia
- Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
| | - Claudio G. Ametrano
- Grainger Bioinformatics Center, Department of Science and Education, The Field Museum, Chicago, IL 60605, USA;
| | - Katja Sterflinger
- Academy of Fine Arts Vienna, Institute of Natual Sciences and Technology in the Arts, 1090 Vienna, Austria;
| | - Donatella Tesei
- Department of Biotechnology, University of Natural Resources and Life Sciences, 1190 Vienna, Austria;
| |
Collapse
|
34
|
Abstract
Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as ‘trimming’. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from Escherichia coli, Mycobacterium tuberculosis and Staphylococcus aureus were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
35
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
36
|
Oosting T, Hilario E, Wellenreuther M, Ritchie PA. DNA degradation in fish: Practical solutions and guidelines to improve DNA preservation for genomic research. Ecol Evol 2020; 10:8643-8651. [PMID: 32884647 PMCID: PMC7452763 DOI: 10.1002/ece3.6558] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 05/26/2020] [Accepted: 06/10/2020] [Indexed: 12/02/2022] Open
Abstract
The more demanding requirements of DNA preservation for genomic research can be difficult to meet when field conditions limit the methodological approaches that can be used or cause samples to be stored in suboptimal conditions. Such limitations may increase rates of DNA degradation, potentially rendering samples unusable for applications such as genome-wide sequencing. Nonetheless, little is known about the impact of suboptimal sampling conditions. We evaluated the performance of two widely used preservation solutions (1. DESS: 20% DMSO, 0.25 M EDTA, NaCl saturated solution, and 2. Ethanol >99.5%) under a range of storage conditions over a three-month period (sampling at 1 day, 1 week, 2 weeks, 1 month, and 3 months) to provide practical guidelines for DNA preservation. DNA degradation was quantified as the reduction in average DNA fragment size over time (DNA fragmentation) because the size distribution of DNA segments plays a key role in generating genomic datasets. Tissues were collected from a marine teleost species, the Australasian snapper, Chrysophrys auratus. We found that the storage solution has a strong effect on DNA preservation. In DESS, DNA was only moderately degraded after three months of storage while DNA stored in ethanol showed high levels of DNA degradation already within 24 hr, making samples unsuitable for next-generation sequencing. Here, we conclude that DESS was the most promising solution when storing samples for genomic applications. We recognize that the best preservation protocol is highly dependent on the organism, tissue type, and study design. We highly recommend performing similar experiments before beginning a study. This study highlights the importance of testing sample preservation protocols and provides both practical and economical advice to improve DNA preservation when sampling for genome-wide applications.
Collapse
Affiliation(s)
- Tom Oosting
- School of Biological SciencesVictoria University of WellingtonWellingtonNew Zealand
| | - Elena Hilario
- The New Zealand Institute for Plant & Food Research LtdAucklandNew Zealand
| | - Maren Wellenreuther
- Nelson Seafood Research UnitThe New Zealand Institute for Plant & Food Research LtdNelsonNew Zealand
- School of Biological SciencesThe University of AucklandAucklandNew Zealand
| | - Peter A. Ritchie
- School of Biological SciencesVictoria University of WellingtonWellingtonNew Zealand
| |
Collapse
|
37
|
Asalone KC, Ryan KM, Yamadi M, Cohen AL, Farmer WG, George DJ, Joppert C, Kim K, Mughal MF, Said R, Toksoz-Exley M, Bisk E, Bracht JR. Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Comput Biol 2020; 16:e1008104. [PMID: 32735589 PMCID: PMC7423139 DOI: 10.1371/journal.pcbi.1008104] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 08/12/2020] [Accepted: 06/29/2020] [Indexed: 12/13/2022] Open
Abstract
High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. In the genomic era, genomes must be reconstructed from fragments using computational methods, or assemblers. How do we know that a new genome assembly is correct? This is important because errors in assembly can lead to downstream problems in gene predictions and these inaccurate results can contaminate databases, affecting later comparative studies. A particular challenge occurs when a diploid organism inherits two highly divergent genome copies from its parents. While it is widely appreciated that this type of data is difficult for assemblers to handle properly, here we show that the process is prone to more errors than previously appreciated. Specifically, we document examples of regional expansion and collapse, affecting downstream gene prediction accuracy, but without changing the overall genome assembly size or other metrics of accuracy. Our results suggest that assembly evaluation methods should be altered to identify whether regional expansions and collapses are present in the genome assembly.
Collapse
Affiliation(s)
- Kathryn C. Asalone
- Biology Department, American University, Washington DC, United States of America
| | - Kara M. Ryan
- Biology Department, American University, Washington DC, United States of America
| | - Maryam Yamadi
- Biology Department, American University, Washington DC, United States of America
| | - Annastelle L. Cohen
- Biology Department, American University, Washington DC, United States of America
| | - William G. Farmer
- Biology Department, American University, Washington DC, United States of America
| | - Deborah J. George
- Biology Department, American University, Washington DC, United States of America
| | - Claudia Joppert
- Biology Department, American University, Washington DC, United States of America
| | - Kaitlyn Kim
- Biology Department, American University, Washington DC, United States of America
| | - Madeeha Froze Mughal
- Biology Department, American University, Washington DC, United States of America
| | - Rana Said
- Biology Department, American University, Washington DC, United States of America
| | - Metin Toksoz-Exley
- Mathematics and Statistics Department, American University, Washington DC, United States of America
| | - Evgeny Bisk
- Office of Information Technology, American University, Washington DC, United States of America
| | - John R. Bracht
- Biology Department, American University, Washington DC, United States of America
- * E-mail:
| |
Collapse
|
38
|
Verlinden H, Sterck L, Li J, Li Z, Yssel A, Gansemans Y, Verdonck R, Holtof M, Song H, Behmer ST, Sword GA, Matheson T, Ott SR, Deforce D, Van Nieuwerburgh F, Van de Peer Y, Vanden Broeck J. First draft genome assembly of the desert locust, Schistocerca gregaria. F1000Res 2020; 9:775. [PMID: 33163158 PMCID: PMC7607483 DOI: 10.12688/f1000research.25148.2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/13/2021] [Indexed: 12/31/2022] Open
Abstract
Background: At the time of publication, the most devastating desert locust crisis in decades is affecting East Africa, the Arabian Peninsula and South-West Asia. The situation is extremely alarming in East Africa, where Kenya, Ethiopia and Somalia face an unprecedented threat to food security and livelihoods. Most of the time, however, locusts do not occur in swarms, but live as relatively harmless solitary insects. The phenotypically distinct solitarious and gregarious locust phases differ markedly in many aspects of behaviour, physiology and morphology, making them an excellent model to study how environmental factors shape behaviour and development. A better understanding of the extreme phenotypic plasticity in desert locusts will offer new, more environmentally sustainable ways of fighting devastating swarms. Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of Schistocerca gregaria was assembled using the ABySS pipeline, scaffolding was improved using LINKS. Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust's use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.
Collapse
Affiliation(s)
- Heleen Verlinden
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium
| | - Lieven Sterck
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium
| | - Jia Li
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium
| | - Zhen Li
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium
| | - Anna Yssel
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa
| | - Yannick Gansemans
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.,NXTGNT, Ghent University, Ghent, 9000, Belgium
| | - Rik Verdonck
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium.,Station d' Ecologie Théorique et Expérimentale, UMR 5321 CNRS et Université Paul Sabatier, Moulis, 09200, France
| | - Michiel Holtof
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium
| | - Hojun Song
- Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA
| | - Spencer T Behmer
- Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA
| | - Gregory A Sword
- Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA
| | - Tom Matheson
- Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, LE1 7RH, UK
| | - Swidbert R Ott
- Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, LE1 7RH, UK
| | - Dieter Deforce
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.,NXTGNT, Ghent University, Ghent, 9000, Belgium
| | - Filip Van Nieuwerburgh
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.,NXTGNT, Ghent University, Ghent, 9000, Belgium
| | - Yves Van de Peer
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium.,Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa
| | - Jozef Vanden Broeck
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium
| |
Collapse
|
39
|
Verlinden H, Sterck L, Li J, Li Z, Yssel A, Gansemans Y, Verdonck R, Holtof M, Song H, Behmer ST, Sword GA, Matheson T, Ott SR, Deforce D, Van Nieuwerburgh F, Van de Peer Y, Vanden Broeck J. First draft genome assembly of the desert locust, Schistocerca gregaria. F1000Res 2020; 9:775. [PMID: 33163158 DOI: 10.12688/f1000research.25148.1] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 12/22/2022] Open
Abstract
Background: At the time of publication, the most devastating desert locust crisis in decades is affecting East Africa, the Arabian Peninsula and South-West Asia. The situation is extremely alarming in East Africa, where Kenya, Ethiopia and Somalia face an unprecedented threat to food security and livelihoods. Most of the time, however, locusts do not occur in swarms, but live as relatively harmless solitary insects. The phenotypically distinct solitarious and gregarious locust phases differ markedly in many aspects of behaviour, physiology and morphology, making them an excellent model to study how environmental factors shape behaviour and development. A better understanding of the extreme phenotypic plasticity in desert locusts will offer new, more environmentally sustainable ways of fighting devastating swarms. Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of Schistocerca gregaria was assembled using the ABySS pipeline, scaffolding was improved using LINKS. Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust's use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.
Collapse
Affiliation(s)
- Heleen Verlinden
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium
| | - Lieven Sterck
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium
| | - Jia Li
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium
| | - Zhen Li
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium
| | - Anna Yssel
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa
| | - Yannick Gansemans
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.,NXTGNT, Ghent University, Ghent, 9000, Belgium
| | - Rik Verdonck
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium.,Station d' Ecologie Théorique et Expérimentale, UMR 5321 CNRS et Université Paul Sabatier, Moulis, 09200, France
| | - Michiel Holtof
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium
| | - Hojun Song
- Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA
| | - Spencer T Behmer
- Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA
| | - Gregory A Sword
- Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA
| | - Tom Matheson
- Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, LE1 7RH, UK
| | - Swidbert R Ott
- Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, LE1 7RH, UK
| | - Dieter Deforce
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.,NXTGNT, Ghent University, Ghent, 9000, Belgium
| | - Filip Van Nieuwerburgh
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.,NXTGNT, Ghent University, Ghent, 9000, Belgium
| | - Yves Van de Peer
- Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.,Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium.,Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa
| | - Jozef Vanden Broeck
- Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium
| |
Collapse
|
40
|
Gurwitz KT, Singh Gaur P, Bellis LJ, Larcombe L, Alloza E, Balint BL, Botzki A, Dimec J, Dominguez del Angel V, Fernandes PL, Korpelainen E, Krause R, Kuzak M, Le Pera L, Leskošek B, Lindvall JM, Marek D, Martinez PA, Muyldermans T, Nygård S, Palagi PM, Peterson H, Psomopoulos F, Spiwok V, van Gelder CWG, Via A, Vidak M, Wibberg D, Morgan SL, Rustici G. A framework to assess the quality and impact of bioinformatics training across ELIXIR. PLoS Comput Biol 2020; 16:e1007976. [PMID: 32702016 PMCID: PMC7377377 DOI: 10.1371/journal.pcbi.1007976] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR’s framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course.
Collapse
Affiliation(s)
- Kim T. Gurwitz
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | | | - Louisa J. Bellis
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Lee Larcombe
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Eva Alloza
- Barcelona Supercomputing Center (BSC), INB Coordination node, Life Sciences Department, Barcelona, Spain
| | - Balint Laszlo Balint
- University of Debrecen, Medical Faculty, Department of Biochemistry and Molecular Biology, Debrecen, Hungary
| | - Alexander Botzki
- VIB Flanders Institute for Biotechnology, VIB Bioinformatics Core, Ghent, Belgium
| | - Jure Dimec
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics (IBMI), University of Ljubljana, Ljubljana, Slovenia
| | | | | | | | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Mateusz Kuzak
- DTL Dutch Techcentre for Life Sciences, Utrecht, the Netherlands
| | - Loredana Le Pera
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), National Research Council of Italy (CNR), Bari, Italy
| | - Brane Leskošek
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics (IBMI), University of Ljubljana, Ljubljana, Slovenia
| | - Jessica M. Lindvall
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Diana Marek
- SIB Training, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Paula A. Martinez
- VIB Flanders Institute for Biotechnology, VIB Bioinformatics Core, Ghent, Belgium
| | - Tuur Muyldermans
- VIB Flanders Institute for Biotechnology, VIB Bioinformatics Core, Ghent, Belgium
| | - Ståle Nygård
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Patricia M. Palagi
- SIB Training, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Fotis Psomopoulos
- Institute of Applied Biosciences (INAB), Center for Research and Technology Hellas (CERTH), Thessaloniki, Greece
| | - Vojtech Spiwok
- Department of Biochemistry and Microbiology, University of Chemistry and Technology, Prague, Czech Republic
| | | | - Allegra Via
- Institute of Molecular Biology and Pathology (IBPM), National Research Council of Italy (CNR), Rome, Italy
| | - Marko Vidak
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics (IBMI), University of Ljubljana, Ljubljana, Slovenia
| | - Daniel Wibberg
- Genome Research of Industrial Microorganisms, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Sarah L. Morgan
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Gabriella Rustici
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
41
|
Vizueta J, Sánchez‐Gracia A, Rozas J. bitacora
: A comprehensive tool for the identification and annotation of gene families in genome assemblies. Mol Ecol Resour 2020; 20:1445-1452. [DOI: 10.1111/1755-0998.13202] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 05/17/2020] [Accepted: 05/27/2020] [Indexed: 02/06/2023]
Affiliation(s)
- Joel Vizueta
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio) Universitat de Barcelona Barcelona Spain
| | - Alejandro Sánchez‐Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio) Universitat de Barcelona Barcelona Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio) Universitat de Barcelona Barcelona Spain
| |
Collapse
|
42
|
de Sales RO, Migliorini LB, Puga R, Kocsis B, Severino P. A Core Genome Multilocus Sequence Typing Scheme for Pseudomonas aeruginosa. Front Microbiol 2020; 11:1049. [PMID: 32528447 PMCID: PMC7264379 DOI: 10.3389/fmicb.2020.01049] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 04/28/2020] [Indexed: 12/15/2022] Open
Abstract
Pseudomonas aeruginosa is a ubiquitous microorganism and an important opportunistic pathogen responsible for a broad spectrum of infections mainly in immunosuppressed and critically ill patients. Molecular investigations traditionally rely on pulsed field gel electrophoresis (PFGE) and multilocus sequence typing (MLST). In this work we propose a core genome multilocus sequence typing (cgMLST) scheme for P. aeruginosa, a methodology that combines traditional MLST principles with whole genome sequencing data. All publicly available complete P. aeruginosa genomes, representing the diversity of this species, were used to establish a cgMLST scheme targeting 2,653 genes. The scheme was then tested using genomes available at contig, chromosome and scaffold levels. The proposed cgMLST scheme for P. aeruginosa typed over 99% (2,314/2,325) of the genomes available for this study considering at least 95% of the cgMLST target genes present. The absence of a certain number gene targets at the threshold considered for both the creation and validation steps due to low genome sequence quality is possibly the main reason for this result. The cgMLST scheme was compared with previously published whole genome single nucleotide polymorphism analysis for the characterization of the population structure of the epidemic clone ST235 and results were highly similar. In order to evaluate the typing resolution of the proposed scheme, collections of isolates belonging to two important STs associated with cystic fibrosis, ST146 and ST274, were typed using this scheme, and ST235 isolates associated with an outbreak were evaluated. Besides confirming the relatedness of all the isolates, earlier determined by MLST, the higher resolution of cgMLST denotes that it may be suitable for surveillance programs, overcoming possible shortcomings of classical MLST. The proposed scheme is publicly available at: https://github.com/BioinformaticsHIAEMolecularMicrobiology/cgMLST-Pseudomonas-aeruginosa.
Collapse
Affiliation(s)
- Romário Oliveira de Sales
- Hospital Israelita Albert Einstein, Albert Einstein Research and Education Institute, São Paulo, Brazil
| | - Letícia Busato Migliorini
- Hospital Israelita Albert Einstein, Albert Einstein Research and Education Institute, São Paulo, Brazil
| | - Renato Puga
- Hospital Israelita Albert Einstein, Albert Einstein Research and Education Institute, São Paulo, Brazil
| | - Bela Kocsis
- Institute of Medical Microbiology, Semmelweis University, Budapest, Hungary
| | - Patricia Severino
- Hospital Israelita Albert Einstein, Albert Einstein Research and Education Institute, São Paulo, Brazil
| |
Collapse
|
43
|
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses. G3-GENES GENOMES GENETICS 2020; 10:1443-1455. [PMID: 32220952 PMCID: PMC7202002 DOI: 10.1534/g3.119.400959] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5-15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.
Collapse
|
44
|
Greshake Tzovaras B, Segers FHID, Bicker A, Dal Grande F, Otte J, Anvar SY, Hankeln T, Schmitt I, Ebersberger I. What Is in Umbilicaria pustulata? A Metagenomic Approach to Reconstruct the Holo-Genome of a Lichen. Genome Biol Evol 2020; 12:309-324. [PMID: 32163141 PMCID: PMC7186782 DOI: 10.1093/gbe/evaa049] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2020] [Indexed: 12/29/2022] Open
Abstract
Lichens are valuable models in symbiosis research and promising sources of biosynthetic genes for biotechnological applications. Most lichenized fungi grow slowly, resist aposymbiotic cultivation, and are poor candidates for experimentation. Obtaining contiguous, high-quality genomes for such symbiotic communities is technically challenging. Here, we present the first assembly of a lichen holo-genome from metagenomic whole-genome shotgun data comprising both PacBio long reads and Illumina short reads. The nuclear genomes of the two primary components of the lichen symbiosis-the fungus Umbilicaria pustulata (33 Mb) and the green alga Trebouxia sp. (53 Mb)-were assembled at contiguities comparable to single-species assemblies. The analysis of the read coverage pattern revealed a relative abundance of fungal to algal nuclei of ∼20:1. Gap-free, circular sequences for all organellar genomes were obtained. The bacterial community is dominated by Acidobacteriaceae and encompasses strains closely related to bacteria isolated from other lichens. Gene set analyses showed no evidence of horizontal gene transfer from algae or bacteria into the fungal genome. Our data suggest a lineage-specific loss of a putative gibberellin-20-oxidase in the fungus, a gene fusion in the fungal mitochondrion, and a relocation of an algal chloroplast gene to the algal nucleus. Major technical obstacles during reconstruction of the holo-genome were coverage differences among individual genomes surpassing three orders of magnitude. Moreover, we show that GC-rich inverted repeats paired with nonrandom sequencing error in PacBio data can result in missing gene predictions. This likely poses a general problem for genome assemblies based on long reads.
Collapse
Affiliation(s)
- Bastian Greshake Tzovaras
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- Lawrence Berkeley National Laboratory, Berkeley, California
- Center for Research & Interdisciplinarity, Université de Paris, France
| | - Francisca H I D Segers
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
| | - Anne Bicker
- Institute for Organismic and Molecular Evolution, Molecular Genetics and Genome Analysis, Johannes Gutenberg University Mainz, Germany
| | - Francesco Dal Grande
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Jürgen Otte
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Seyed Yahya Anvar
- Department of Human Genetics, Leiden University Medical Center, The Netherlands
| | - Thomas Hankeln
- Institute for Organismic and Molecular Evolution, Molecular Genetics and Genome Analysis, Johannes Gutenberg University Mainz, Germany
| | - Imke Schmitt
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
- Molecular Evolutionary Biology Group, Institute of Ecology, Diversity, and Evolution, Goethe University Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| |
Collapse
|
45
|
Inderbitzin P, Robbertse B, Schoch CL. Species Identification in Plant-Associated Prokaryotes and Fungi Using DNA. PHYTOBIOMES JOURNAL 2020; 4:103-114. [PMID: 35265781 PMCID: PMC8903201 DOI: 10.1094/pbiomes-12-19-0067-rvw] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Species names are fundamental to managing biological information. The surge of interest in microbial diversity has resulted in an increase in the number of microbes that need to be identified and assigned a species name. This article provides an introduction to the principles of DNA-based identification of Archaea and Bacteria traditionally known as prokaryotes, and Fungi, the Oomycetes and other protists, collectively referred to as fungi. The prokaryotes and fungi are the most commonly studied microbes from plants, and we introduce the most relevant concepts of prokaryote and fungal taxonomy and nomenclature. We first explain how prokaryote and fungal species are defined, delimited, and named, and then summarize the criteria and methods used to identify prokaryote and fungal organisms to species.
Collapse
Affiliation(s)
| | - Barbara Robbertse
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892
| | - Conrad L. Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892
| |
Collapse
|
46
|
Till 2018: a survey of biomolecular sequences in genus Panax. J Ginseng Res 2020; 44:33-43. [PMID: 32095095 PMCID: PMC7033366 DOI: 10.1016/j.jgr.2019.06.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 06/07/2019] [Accepted: 06/12/2019] [Indexed: 12/22/2022] Open
Abstract
Ginseng is popularly known to be the king of ancient medicines and is used widely in most of the traditional medicinal compositions due to its various pharmaceutical properties. Numerous studies are being focused on this plant's curative effects to discover their potential health benefits in most human diseases, including cancer- the most life-threatening disease worldwide. Modern pharmacological research has focused mainly on ginsenosides, the major bioactive compounds of ginseng, because of their multiple therapeutic applications. Various issues on ginseng plant development, physiological processes, and agricultural issues have also been studied widely through state-of-the-art, high-throughput sequencing technologies. Since the beginning of the 21st century, the number of publications on ginseng has rapidly increased, with a recent count of more than 6,000 articles and reviews focusing notably on ginseng. Owing to the implementation of various technologies and continuous efforts, the ginseng plant genomes have been decoded effectively in recent years. Therefore, this review focuses mainly on the cellular biomolecular sequences in ginseng plants from the perspective of the central molecular dogma, with an emphasis on genomes, transcriptomes, and proteomes, together with a few other related studies.
Collapse
|
47
|
Wey B, Heavner ME, Wittmeyer KT, Briese T, Hopper KR, Govind S. Immune Suppressive Extracellular Vesicle Proteins of Leptopilina heterotoma Are Encoded in the Wasp Genome. G3 (BETHESDA, MD.) 2020; 10:1-12. [PMID: 31676506 PMCID: PMC6945029 DOI: 10.1534/g3.119.400349] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/22/2019] [Indexed: 12/29/2022]
Abstract
Leptopilina heterotoma are obligate parasitoid wasps that develop in the body of their Drosophila hosts. During oviposition, female wasps introduce venom into the larval hosts' body cavity. The venom contains discrete, 300 nm-wide, mixed-strategy extracellular vesicles (MSEVs), until recently referred to as virus-like particles. While the crucial immune suppressive functions of L. heterotoma MSEVs have remained undisputed, their biotic nature and origin still remain controversial. In recent proteomics analyses of L. heterotoma MSEVs, we identified 161 proteins in three classes: conserved eukaryotic proteins, infection and immunity related proteins, and proteins without clear annotation. Here we report 246 additional proteins from the L. heterotoma MSEV proteome. An enrichment analysis of the entire proteome supports vesicular nature of these structures. Sequences for more than 90% of these proteins are present in the whole-body transcriptome. Sequencing and de novo assembly of the 460 Mb-sized L. heterotoma genome revealed 90% of MSEV proteins have coding regions within the genomic scaffolds. Altogether, these results explain the stable association of MSEVs with their wasps, and like other wasp structures, their vertical inheritance. While our results do not rule out a viral origin of MSEVs, they suggest that a similar strategy for co-opting cellular machinery for immune suppression may be shared by other wasps to gain advantage over their hosts. These results are relevant to our understanding of the evolution of figitid and related wasp species.
Collapse
Affiliation(s)
- Brian Wey
- Biology Department, The City College of New York, 160 Convent Avenue, New York, 10031
- PhD Program in Biology, The Graduate Center of the City University of New York
| | - Mary Ellen Heavner
- Biology Department, The City College of New York, 160 Convent Avenue, New York, 10031
- PhD Program in Biochemistry, The Graduate Center of the City University of New York, 365 Fifth Avenue, New York, 10016
- Laboratory of Host-Pathogen Biology, Rockefeller University, 1230 York Ave, New York, 10065
| | - Kameron T Wittmeyer
- USDA-ARS, Beneficial Insect Introductions Research Unit, Newark, DE 19713, and
| | - Thomas Briese
- Center of Infection and Immunity, and Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, 10032
| | - Keith R Hopper
- USDA-ARS, Beneficial Insect Introductions Research Unit, Newark, DE 19713, and
| | - Shubha Govind
- Biology Department, The City College of New York, 160 Convent Avenue, New York, 10031,
- PhD Program in Biology, The Graduate Center of the City University of New York
- PhD Program in Biochemistry, The Graduate Center of the City University of New York, 365 Fifth Avenue, New York, 10016
| |
Collapse
|
48
|
Norsigian CJ, Fang X, Seif Y, Monk JM, Palsson BO. A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat Protoc 2020; 15:1-14. [PMID: 31863076 PMCID: PMC7017905 DOI: 10.1038/s41596-019-0254-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/08/2019] [Indexed: 11/09/2022]
Abstract
Genome-scale models (GEMs) of bacterial strains' metabolism have been formulated and used over the past 20 years. Recently, with the number of genome sequences exponentially increasing, multi-strain GEMs have proved valuable to define the properties of a species. Here, through four major stages, we extend the original Protocol used to generate a GEM for a single strain to enable multi-strain GEMs: (i) obtain or generate a high-quality model of a reference strain; (ii) compare the genome sequence between a reference strain and target strains to generate a homology matrix; (iii) generate draft strain-specific models from the homology matrix; and (iv) manually curate draft models. These multi-strain GEMs can be used to study pan-metabolic capabilities and strain-specific differences across a species, thus providing insights into its range of lifestyles. Unlike the original Protocol, this procedure is scalable and can be partly automated with the Supplementary Jupyter notebook Tutorial. This Protocol Extension joins the ranks of other comparable methods for generating models such as CarveMe and KBase. This extension of the original Protocol takes on the order of weeks to multiple months to complete depending on the availability of a suitable reference model.
Collapse
Affiliation(s)
- Charles J Norsigian
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Xin Fang
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Yara Seif
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Jonathan M Monk
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
49
|
Mittal P, Jaiswal SK, Vijay N, Saxena R, Sharma VK. Comparative analysis of corrected tiger genome provides clues to its neuronal evolution. Sci Rep 2019; 9:18459. [PMID: 31804567 PMCID: PMC6895189 DOI: 10.1038/s41598-019-54838-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 11/14/2019] [Indexed: 01/01/2023] Open
Abstract
The availability of completed and draft genome assemblies of tiger, leopard, and other felids provides an opportunity to gain comparative insights on their unique evolutionary adaptations. However, genome-wide comparative analyses are susceptible to errors in genome sequences and thus require accurate genome assemblies for reliable evolutionary insights. In this study, while analyzing the tiger genome, we found almost one million erroneous substitutions in the coding and non-coding region of the genome affecting 4,472 genes, hence, biasing the current understanding of tiger evolution. Moreover, these errors produced several misleading observations in previous studies. Thus, to gain insights into the tiger evolution, we corrected the erroneous bases in the genome assembly and gene set of tiger using ‘SeqBug’ approach developed in this study. We sequenced the first Bengal tiger genome and transcriptome from India to validate these corrections. A comprehensive evolutionary analysis was performed using 10,920 orthologs from nine mammalian species including the corrected gene sets of tiger and leopard and using five different methods at three hierarchical levels, i.e. felids, Panthera, and tiger. The unique genetic changes in tiger revealed that the genes showing signatures of adaptation in tiger were enriched in development and neuronal functioning. Specifically, the genes belonging to the Notch signalling pathway, which is among the most conserved pathways involved in embryonic and neuronal development, were found to have significantly diverged in tiger in comparison to the other mammals. Our findings suggest the role of adaptive evolution in neuronal functions and development processes, which correlates well with the presence of exceptional traits such as sensory perception, strong neuro-muscular coordination, and hypercarnivorous behaviour in tiger.
Collapse
Affiliation(s)
- Parul Mittal
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Shubham K Jaiswal
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Rituja Saxena
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Vineet K Sharma
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India.
| |
Collapse
|
50
|
Giani AM, Gallo GR, Gianfranceschi L, Formenti G. Long walk to genomics: History and current approaches to genome sequencing and assembly. Comput Struct Biotechnol J 2019; 18:9-19. [PMID: 31890139 PMCID: PMC6926122 DOI: 10.1016/j.csbj.2019.11.002] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 11/03/2019] [Accepted: 11/06/2019] [Indexed: 12/13/2022] Open
Abstract
Genomes represent the starting point of genetic studies. Since the discovery of DNA structure, scientists have devoted great efforts to determine their sequence in an exact way. In this review we provide a comprehensive historical background of the improvements in DNA sequencing technologies that have accompanied the major milestones in genome sequencing and assembly, ranging from early sequencing methods to Next-Generation Sequencing platforms. We then focus on the advantages and challenges of the current technologies and approaches, collectively known as Third Generation Sequencing. As these technical advancements have been accompanied by progress in analytical methods, we also review the bioinformatic tools currently employed in de novo genome assembly, as well as some applications of Third Generation Sequencing technologies and high-quality reference genomes.
Collapse
Key Words
- BAC, Bacterial Artificial Chromosome
- Bioinformatics
- Genome assembly
- HGP, Human Genome Project
- HMW, high molecular weight
- HapMap, haplotype map
- NGS, Next Generation Sequencing
- Next-generation
- OLC, Overlap-Layout-Consensus
- QV, Quality Value (QV)
- Reference
- SBS, Sequencing by Synthesis
- SMRT, Single Molecule Real-Time
- SNPs, Single Nucleotide Polymorphisms
- SRA, Short Read Archive
- SV, Structural Variant
- Sequencing
- TGS, Third Generation Sequencing
- Third-generation
- WGS, Whole Genome Sequencing
- ZMW, Zero-Mode Waveguide
- bp, base pair
- dNTPs, deoxynucleoside triphosphates
- ddNTP, 2,3-dideoxynucleoside triphosphate
Collapse
Affiliation(s)
- Alice Maria Giani
- Department of Surgery, Weill Cornell Medical College, New York, NY, USA
| | | | | | | |
Collapse
|