1
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
2
|
Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO: Assessing Genomic Data Quality and Beyond. Curr Protoc 2021; 1:e323. [PMID: 34936221 DOI: 10.1002/cpz1.323] [Citation(s) in RCA: 472] [Impact Index Per Article: 118.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.
Collapse
Affiliation(s)
- Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Matthew R Berkeley
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Mathieu Seppey
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
3
|
Mitchell LJ, Cheney KL, Luehrmann M, Marshall NJ, Michie K, Cortesi F. Molecular evolution of ultraviolet visual opsins and spectral tuning of photoreceptors in anemonefishes (Amphiprioninae). Genome Biol Evol 2021; 13:6347585. [PMID: 34375382 PMCID: PMC8511661 DOI: 10.1093/gbe/evab184] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/05/2021] [Indexed: 11/29/2022] Open
Abstract
Many animals including birds, reptiles, insects, and teleost fishes can see ultraviolet (UV) light (shorter than 400 nm), which has functional importance for foraging and communication. For coral reef fishes, shallow reef environments transmit a broad spectrum of light, rich in UV, driving the evolution of diverse spectral sensitivities. However, the identities and sites of the specific visual genes that underly vision in reef fishes remain elusive and are useful in determining how evolution has tuned vision to suit life on the reef. We investigated the visual systems of 11 anemonefish (Amphiprioninae) species, specifically probing for the molecular pathways that facilitate UV-sensitivity. Searching the genomes of anemonefishes, we identified a total of eight functional opsin genes from all five vertebrate visual opsin subfamilies. We found rare instances of teleost UV-sensitive SWS1 opsin gene duplications that produced two functionally coding paralogs (SWS1α and SWS1β) and a pseudogene. We also found separate green sensitive RH2A opsin gene duplicates not yet reported in the family Pomacentridae. Transcriptome analysis revealed false clown anemonefish (Amphiprion ocellaris) expressed one rod opsin (RH1) and six cone opsins (SWS1β, SWS2B, RH2B, RH2A-1, RH2A-2, LWS) in the retina. Fluorescent in situ hybridization highlighted the (co-)expression of SWS1β with SWS2B in single cones, and either RH2B, RH2A, or RH2A together with LWS in different members of double cone photoreceptors (two single cones fused together). Our study provides the first in-depth characterization of visual opsin genes found in anemonefishes and provides a useful basis for the further study of UV-vision in reef fishes.
Collapse
Affiliation(s)
- Laurie J Mitchell
- School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Karen L Cheney
- School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Martin Luehrmann
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - N Justin Marshall
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Kyle Michie
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia.,King's College, Cambridge, CB2 1ST, UK
| | - Fabio Cortesi
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
4
|
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021; 22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.
Collapse
Affiliation(s)
- Sagnik Banerjee
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Statistics, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Margaret Woodhouse
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-Agricultural Research Service, Albany, CA, 94710, USA
| | - Roger P Wise
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
5
|
Sanderson BJ, DiFazio SP, Cronk QCB, Ma T, Olson MS. A targeted sequence capture array for phylogenetics and population genomics in the Salicaceae. APPLICATIONS IN PLANT SCIENCES 2020; 8:e11394. [PMID: 33163293 PMCID: PMC7598885 DOI: 10.1002/aps3.11394] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 08/12/2020] [Indexed: 05/03/2023]
Abstract
PREMISE The family Salicaceae has proved taxonomically challenging, especially in the genus Salix, which is speciose and features frequent hybridization and polyploidy. Past efforts to reconstruct the phylogeny with molecular barcodes have failed to resolve the species relationships of many sections of the genus. METHODS We used the wealth of sequence data in the family to design sequence capture probes to target regions of 300-1200 bp of exonic regions of 972 genes. RESULTS We recovered sequence data for nearly all of the targeted genes in three species of Populus and three species of Salix. We present a species tree, discuss concordance among gene trees, and present population genomic summary statistics for these loci. CONCLUSIONS Our sequence capture array has extremely high capture efficiency within the genera Populus and Salix, resulting in abundant phylogenetic information. Additionally, these loci show promise for population genomic studies.
Collapse
Affiliation(s)
- Brian J. Sanderson
- Department of Biological SciencesTexas Tech UniversityLubbockTexas79409‐3131USA
- Department of BiologyWest Virginia UniversityMorgantownWest Virginia26506USA
| | - Stephen P. DiFazio
- Department of BiologyWest Virginia UniversityMorgantownWest Virginia26506USA
| | - Quentin C. B. Cronk
- Department of BotanyUniversity of British ColumbiaVancouverBritish ColumbiaV6T 1Z4Canada
| | - Tao Ma
- Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of EducationCollege of Life SciencesSichuan UniversityChengdu610065People’s Republic of China
| | - Matthew S. Olson
- Department of Biological SciencesTexas Tech UniversityLubbockTexas79409‐3131USA
| |
Collapse
|
6
|
Badet T, Croll D. The rise and fall of genes: origins and functions of plant pathogen pangenomes. CURRENT OPINION IN PLANT BIOLOGY 2020; 56:65-73. [PMID: 32480355 DOI: 10.1016/j.pbi.2020.04.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 04/14/2020] [Accepted: 04/18/2020] [Indexed: 06/11/2023]
Abstract
Plant pathogens can rapidly overcome resistance of their hosts by mutating key pathogenicity genes encoding for effectors. Pathogen adaptation is fuelled by extensive genetic variability in populations and different strains may not share the same set of genes. Recently, such an intra-specific variation in gene content became formalized as pangenomes distinguishing core genes (i.e. shared) and accessory genes (i.e. lineage or strain-specific). Across pathogens species, key effectors tend to be part of the rapidly evolving accessory genome. Here, we show how the construction and analysis of pathogen pangenomes provide deep insights into the dynamic host adaptation process. We also discuss how pangenomes should ideally be built and how geography, niche and lifestyle likely determine pangenome sizes.
Collapse
Affiliation(s)
- Thomas Badet
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Switzerland.
| |
Collapse
|
7
|
Mangelson H, Jarvis DE, Mollinedo P, Rollano‐Penaloza OM, Palma‐Encinas VD, Gomez‐Pando LR, Jellen EN, Maughan PJ. The genome of Chenopodium pallidicaule: An emerging Andean super grain. APPLICATIONS IN PLANT SCIENCES 2019; 7:e11300. [PMID: 31832282 PMCID: PMC6858295 DOI: 10.1002/aps3.11300] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 09/24/2019] [Indexed: 05/28/2023]
Abstract
PREMISE Cañahua is a semi-domesticated crop grown in high-altitude regions of the Andes. It is an A-genome diploid (2n = 2x = 18) relative of the allotetraploid (AABB) Chenopodium quinoa and shares many of its nutritional benefits. Cañahua seed contains a complete protein, a low glycemic index, and offers a wide variety of nutritionally important vitamins and minerals. METHODS The reference assembly was developed using a combination of short- and long-read sequencing techniques, including multiple rounds of Hi-C-based proximity-guided assembly. RESULTS The final assembly of the ~363-Mbp genome consists of 4633 scaffolds, with 96.6% of the assembly contained in nine scaffolds representing the nine haploid chromosomes of the species. Repetitive element analysis classified 52.3% of the assembly as repetitive, with the most common repeat identified as long terminal repeat retrotransposons. MAKER annotation of the final assembly yielded 22,832 putative gene models. DISCUSSION When compared with quinoa, strong patterns of synteny support the hypothesis that cañahua is a close A-genome diploid relative, and thus potentially a simplified model diploid species for genetic analysis and improvement of quinoa. Resequencing and phylogenetic analysis of a diversity panel of cañahua accessions suggests that coordinated efforts are needed to enhance genetic diversity conservation within ex situ germplasm collections.
Collapse
Affiliation(s)
- Hayley Mangelson
- Department of Plant and Wildlife SciencesBrigham Young University5144 LSBProvoUtah84602USA
| | - David E. Jarvis
- Department of Plant and Wildlife SciencesBrigham Young University5144 LSBProvoUtah84602USA
| | - Patricia Mollinedo
- Institute of Natural Product ResearchUniversidad Mayor de San AndrésLa PazBolivia
| | | | | | - Luz Rayda Gomez‐Pando
- Departamento de FitotecniaFacultad de AgronomíaUniversidad Nacional Agraria de La MolinaLa MolinaPeru
| | - Eric N. Jellen
- Department of Plant and Wildlife SciencesBrigham Young University5144 LSBProvoUtah84602USA
| | - Peter J. Maughan
- Department of Plant and Wildlife SciencesBrigham Young University5144 LSBProvoUtah84602USA
| |
Collapse
|
8
|
Cooke I, Mead O, Whalen C, Boote C, Moya A, Ying H, Robbins S, Strugnell JM, Darling A, Miller D, Voolstra CR, Adamska M. Molecular techniques and their limitations shape our view of the holobiont. ZOOLOGY 2019; 137:125695. [PMID: 31759226 DOI: 10.1016/j.zool.2019.125695] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/08/2019] [Accepted: 07/12/2019] [Indexed: 11/26/2022]
Abstract
It is now recognised that the biology of almost any organism cannot be fully understood without recognising the existence and potential functional importance of associated microbes. Arguably, the emergence of this holistic viewpoint may never have occurred without the development of a crucial molecular technique, 16S rDNA amplicon sequencing, which allowed microbial communities to be easily profiled across a broad range of contexts. A diverse array of molecular techniques are now used to profile microbial communities, infer their evolutionary histories, visualise them in host tissues, and measure their molecular activity. In this review, we examine each of these categories of measurement and inference with a focus on the questions they make tractable, and the degree to which their capabilities and limitations shape our view of the holobiont.
Collapse
Affiliation(s)
- Ira Cooke
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia.
| | - Oliver Mead
- ARC Centre of Excellence for Coral Reef Studies, Australian National University, Canberra, ACT, 2601, Australia; Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | - Casey Whalen
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | - Chloë Boote
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | - Aurelie Moya
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | - Hua Ying
- Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | - Steven Robbins
- Australian Center for Ecogenomics, University of Queensland, St. Lucia, QLD, 4072, Australia
| | - Jan M Strugnell
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre of Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, 4810, QLD, Australia; Department of Ecology, Environment and Evolution, School of Life Sciences, La Trobe University, Melbourne, 3083, Australia
| | - Aaron Darling
- The ithree institute, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - David Miller
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | | | - Maja Adamska
- ARC Centre of Excellence for Coral Reef Studies, Australian National University, Canberra, ACT, 2601, Australia; Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | | |
Collapse
|
9
|
McCarthy TW, Chou HC, Brendel VP. SRAssembler: Selective Recursive local Assembly of homologous genomic regions. BMC Bioinformatics 2019; 20:371. [PMID: 31266441 PMCID: PMC6604332 DOI: 10.1186/s12859-019-2949-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 06/13/2019] [Indexed: 11/16/2022] Open
Abstract
Background The falling cost of next-generation sequencing technology has allowed deep sequencing across related species and of individuals within species. Whole genome assemblies from these data remain high time- and resource-consuming computational tasks, particularly if best solutions are sought using different assembly strategies and parameter sets. However, in many cases, the underlying research questions are not genome-wide but rather target specific genes or sets of genes. We describe a novel assembly tool, SRAssembler, that efficiently assembles only contigs containing potential homologs of a gene or protein query, thus enabling gene-specific genome studies over large numbers of short read samples. Results We demonstrate the functionality of SRAssembler with examples largely drawn from plant genomics. The workflow implements a recursive strategy by which relevant reads are successively pulled from the input sets based on overlapping significant matches, resulting in virtual chromosome walking. The typical workflow behavior is illustrated with assembly of simulated reads. Applications to real data show that SRAssembler produces homologous contigs of equivalent quality to whole genome assemblies. Settings can be chosen to not only assemble presumed orthologs but also paralogous gene loci in distinct contigs. A key application is assembly of the same locus in many individuals from population genome data, which provides assessment of structural variation beyond what can be inferred from read mapping to a reference genome alone. SRAssembler can be used on modest computing resources or used in parallel on high performance computing clusters (most easily by invoking a dedicated Singularity image). Conclusions SRAssembler offers an efficient tool to complement whole genome assembly software. It can be used to solve gene-specific research questions based on large genomic read samples from multiple sources and would be an expedient choice when whole genome assembly from the reads is either not feasible, too costly, or unnecessary. The program can also aid decision making on the depth of sequencing in an ongoing novel genome sequencing project or with respect to ultimate whole genome assembly strategies. Electronic supplementary material The online version of this article (10.1186/s12859-019-2949-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas W McCarthy
- Department of Biology, Indiana University, Bloomington, 47405, Indiana, USA
| | - Hsien-Chao Chou
- Department of Oncology, St Jude Children's Research Hospital, Memphis, 38105, Tennessee, USA
| | - Volker P Brendel
- Department of Biology, Indiana University, Bloomington, 47405, Indiana, USA. .,Department of Computer Science, Indiana University, Bloomington, 47405, Indiana, USA.
| |
Collapse
|
10
|
Drukewitz SH, von Reumont BM. The Significance of Comparative Genomics in Modern Evolutionary Venomics. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00163] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
|
11
|
Pengelly RJ, Collins A. Linkage disequilibrium maps to guide contig ordering for genome assembly. Bioinformatics 2018; 35:541-545. [DOI: 10.1093/bioinformatics/bty687] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 07/13/2018] [Accepted: 08/03/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Reuben J Pengelly
- Genetic Epidemiology & Bioinformatics, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology & Bioinformatics, Faculty of Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
12
|
Brand P, Robertson HM, Lin W, Pothula R, Klingeman WE, Jurat-Fuentes JL, Johnson BR. The origin of the odorant receptor gene family in insects. eLife 2018; 7:e38340. [PMID: 30063003 PMCID: PMC6080948 DOI: 10.7554/elife.38340] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Accepted: 07/24/2018] [Indexed: 02/04/2023] Open
Abstract
The origin of the insect odorant receptor (OR) gene family has been hypothesized to have coincided with the evolution of terrestriality in insects. Missbach et al. (2014) suggested that ORs instead evolved with an ancestral OR co-receptor (Orco) after the origin of terrestriality and the OR/Orco system is an adaptation to winged flight in insects. We investigated genomes of the Collembola, Diplura, Archaeognatha, Zygentoma, Odonata, and Ephemeroptera, and find ORs present in all insect genomes but absent from lineages predating the evolution of insects. Orco is absent only in the ancestrally wingless insect lineage Archaeognatha. Our new genome sequence of the zygentoman firebrat Thermobia domestica reveals a full OR/Orco system. We conclude that ORs evolved before winged flight, perhaps as an adaptation to terrestriality, representing a key evolutionary novelty in the ancestor of all insects, and hence a molecular synapomorphy for the Class Insecta.
Collapse
Affiliation(s)
- Philipp Brand
- Department of Evolution and EcologyCenter for Population Biology, University of California, DavisDavisUnited States
| | - Hugh M Robertson
- Department of EntomologyUniversity of Illinois at Urbana-ChampaignUrbanaUnited States
| | - Wei Lin
- Department of Entomology and NematologyUniversity of California, DavisDavisUnited States
| | - Ratnasri Pothula
- Department of Entomology and Plant PathologyUniversity of TennesseeKnoxvilleUnited States
| | | | | | - Brian R Johnson
- Department of Entomology and NematologyUniversity of California, DavisDavisUnited States
| |
Collapse
|