1
|
López-Solano A, Doadrio I, Nester TL, Perea S. De novo genome hybrid assembly and annotation of the endangered and euryhaline fish Aphanius iberus (Valenciennes, 1846) with identification of genes potentially involved in salinity adaptation. BMC Genomics 2025; 26:136. [PMID: 39939939 PMCID: PMC11817801 DOI: 10.1186/s12864-025-11327-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 02/05/2025] [Indexed: 02/14/2025] Open
Abstract
BACKGROUND The sequencing of non-model species has increased exponentially in recent years, largely due to the advent of novel sequencing technologies. In this study, we construct the Reference Genome of the Spanish toothcarp (Aphanius iberus (Valenciennes, 1846)), a renowned euryhaline fish species. This species is native to the marshes along the Mediterranean coast of Spain and has been threatened with extinction as a result of habitat modification caused by urbanization, agriculture, and its popularity among aquarium hobbyists since the mid-twentieth century. It is also one of the first Reference Genome for Euro-Asian species within the globally distributed order Cyprinodontiformes. Additionally, this effort aims to enhance our comprehension of the species' evolutionary ecology and history, particularly its remarkable adaptations that enable it to thrive in diverse and constantly changing inland aquatic environments. RESULTS A hybrid assembly approach was employed, integrating PacBio long-read sequencing with Illumina short-read data. In addition to the assembly, an extensive functional annotation of the genome is provided by using AUGUSTUS, and two different approaches (InterProScan and Sma3s). The genome size (1.15 Gb) is consistent with that of the most closely related species, and its quality and completeness, as assessed with various methods, exceeded the suggested minimum thresholds, thus confirming the robustness of the assembly. When conducting an orthology analysis, it was observed that nearly all genes were grouped in orthogroups that included genes of genetically similar species. GO Term annotation revealed, among others, categories related with salinity regulation processes (ion transport, transmembrane transport, membrane related terms or calcium ion binding). CONCLUSIONS The integration of genomic data with predicted genes presents future research opportunities across multiple disciplines, such as physiology, reproduction, disease, and opens up new avenues for future studies in comparative genomic studies. Of particular interest is the investigation of genes potentially associated with salinity adaptation, as identified in this study. Overall, this study contributes to the growing database of Reference Genomes, provides valuable information that enhances the knowledge within the order Cyprinodontiformes, and aids in improving the conservation status of threatened species by facilitating a better understanding of their behavior in nature and optimizing resource allocation towards their preservation.
Collapse
Affiliation(s)
- Alfonso López-Solano
- Museo Nacional de Ciencias Naturales, C/ José Gutiérrez Abascal, 2, 28006, Madrid, Spain.
| | - Ignacio Doadrio
- Museo Nacional de Ciencias Naturales, C/ José Gutiérrez Abascal, 2, 28006, Madrid, Spain
| | - Tessa Lynn Nester
- Museo Nacional de Ciencias Naturales, C/ José Gutiérrez Abascal, 2, 28006, Madrid, Spain
| | - Silvia Perea
- Museo Nacional de Ciencias Naturales, C/ José Gutiérrez Abascal, 2, 28006, Madrid, Spain
- Tragsatec. Grupo Tragsa, C/ Julián Camarillo 6B, Madrid, 28037, Spain
| |
Collapse
|
2
|
Zhao M, Oswald JA, Allen JM, Owens HL, Hosner PA, Guralnick RP, Braun EL, Kimball RT. A phylogenomic tree of wood-warblers (Aves: Parulidae): Dealing with good, bad, and ugly samples. Mol Phylogenet Evol 2025; 202:108235. [PMID: 39542406 DOI: 10.1016/j.ympev.2024.108235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 10/26/2024] [Accepted: 11/10/2024] [Indexed: 11/17/2024]
Abstract
The New World warblers (Parulidae) are a model group for ecological and evolutionary analyses. However, current phylogenetic relationships across this family are based upon few loci. Here we use ultraconserved elements (UCEs) to estimate a rigorous species-level phylogeny for the family. As is true for many groups, high-quality tissues were unavailable for some taxa. Thus, we explored methods for incorporating sequences derived from historical (toe pad) samples to expand the phylogenetic datasets. We recovered an average of 4,186 UCE loci and mitochondrial bycatch data (supplemented with published mitochondrial data) from 96% of all currently recognized species. We found that the UCE phylogeny built with alignments with less than 70% of gaps and ambiguities recovered the most robust phylogenetic relationships for this family, representing 101 species. Using this phylogeny as a topological backbone and adding ten fair quality "bad" samples effectively generated an overall well supported phylogeny, representing 108 species (∼90% of all species). Based on this tree, we then added in seven poor quality "ugly" samples and six of those were placed within their expected genera. We also explored the phylogenetic positions of the likely extinct Leucopeza semperi and the endangered Catharopeza bishopi where limited data was obtained. Overall, taxonomic placements in our UCE trees largely correspond to previously published studies with the recovery of all currently recognized genera as monophyletic except for Basileuterus which was rendered paraphyletic by B. lachrymosus. Our study provides insights in understanding the phylogenetic relationships of a model Passeriformes family and outlines effective practices for managing sparse genomic data sourced from historical museum specimens. Variable topological arrangements across datasets and analyses reflect the evolutionary complexity of this group and provide future topics for in-depth studies.
Collapse
Affiliation(s)
- Min Zhao
- Department of Biology, University of Florida, Gainesville, FL 32611, USA; Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Jessica A Oswald
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA; U.S. Fish and Wildlife Service, National Fish and Wildlife Forensic Laboratory, Ashland, OR 97520, USA
| | - Julie M Allen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24060, USA
| | - Hannah L Owens
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA; Center for Global Mountain Biodiversity, Section for Biodiversity, Globe Institute, University of Copenhagen, København Ø, Denmark
| | - Peter A Hosner
- Center for Global Mountain Biodiversity, Section for Biodiversity, Globe Institute, University of Copenhagen, København Ø, Denmark; Natural History Museum Denmark, University of Copenhagen, København Ø, Denmark
| | - Robert P Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
3
|
Forthman M, Downie C, Miller CW, Kimball RT. Evolution of stridulatory mechanisms: vibroacoustic communication may be common in leaf-footed bugs and allies (Heteroptera: Coreoidea). ROYAL SOCIETY OPEN SCIENCE 2023; 10:221348. [PMID: 37122949 PMCID: PMC10130729 DOI: 10.1098/rsos.221348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 02/07/2023] [Indexed: 05/03/2023]
Abstract
Intra- and interspecific communication is crucial to fitness via its role in facilitating mating, territoriality and defence. Yet, the evolution of animal communication systems is puzzling-how do they originate and change over time? Studying stridulatory morphology provides a tractable opportunity to deduce the origin and diversification of a communication mechanism. Stridulation occurs when two sclerotized structures rub together to produce vibratory and acoustic (vibroacoustic) signals, such as a cricket 'chirp'. We investigated the evolution of stridulatory mechanisms in the superfamily Coreoidea (Hemiptera: Heteroptera), a group of insects known for elaborate male fighting behaviours and enlarged hindlegs. We surveyed a large sampling of taxa and used a phylogenomic dataset to investigate the evolution of stridulatory mechanisms. We identified four mechanisms, with at least five evolutionary gains. One mechanism, occurring only in male Harmostini (Rhopalidae), is described for the first time. Some stridulatory mechanisms appear to be non-homoplastic apomorphies within Rhopalidae, while others are homoplastic or potentially homoplastic within Coreidae and Alydidae, respectively. We detected no losses of these mechanisms once evolved, suggesting they are adaptive. Our work sets the stage for further behavioural, evolutionary and ecological studies to better understand the context in which these traits evolve and change.
Collapse
Affiliation(s)
- Michael Forthman
- California State Collection of Arthropods, Plant Pest Diagnostics Branch, California Department of Food & Agriculture, 3294 Meadowview Road, Sacramento, CA 95832, USA
- Entomology & Nematology Department, University of Florida, 1881 Natural Area Drive, Gainesville, FL 32611, USA
| | | | - Christine W. Miller
- Entomology & Nematology Department, University of Florida, 1881 Natural Area Drive, Gainesville, FL 32611, USA
| | - Rebecca T. Kimball
- Department of Biology, University of Florida, 876 Newell Drive, Gainesville, FL 32611, USA
| |
Collapse
|
4
|
Guo A, Salzberg SL, Zimin AV. JASPER: A fast genome polishing tool that improves accuracy of genome assemblies. PLoS Comput Biol 2023; 19:e1011032. [PMID: 37000853 PMCID: PMC10096238 DOI: 10.1371/journal.pcbi.1011032] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 04/12/2023] [Accepted: 03/16/2023] [Indexed: 04/03/2023] Open
Abstract
Advances in long-read sequencing technologies have dramatically improved the contiguity and completeness of genome assemblies. Using the latest nanopore-based sequencers, we can generate enough data for the assembly of a human genome from a single flow cell. With the long-read data from these sequences, we can now routinely produce de novo genome assemblies in which half or more of a genome is contained in megabase-scale contigs. Assemblies produced from nanopore data alone, though, have relatively high error rates and can benefit from a process called polishing, in which more-accurate reads are used to correct errors in the consensus sequence. In this manuscript, we present a novel tool for genome polishing called JASPER (Jellyfish-based Assembly Sequence Polisher for Error Reduction). In contrast to many other polishing methods, JASPER gains efficiency by avoiding the alignment of reads to the assembly. Instead, JASPER uses a database of k-mer counts that it creates from the reads to detect and correct errors in the consensus. Our experiments demonstrate that JASPER is faster than alignment-based polishers, and both faster and more accurate than other k-mer based polishing methods. We also introduce the idea of using a polishing tool to create population-specific reference genomes, and illustrate this idea using sequence data from multiple individuals from Tokyo, Japan.
Collapse
Affiliation(s)
- Alina Guo
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Steven L. Salzberg
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
5
|
Karam Q, Kumar V, Shajan AB, Al-Nuaimi S, Sattari Z, El-Dakour S. De-novo genome assembly and annotation of sobaity seabream Sparidentex hasta. Front Genet 2022; 13:988488. [PMID: 36386818 PMCID: PMC9659893 DOI: 10.3389/fgene.2022.988488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 10/06/2022] [Indexed: 11/30/2022] Open
Abstract
Sparidentex hasta (Valenciennes, 1830) of the Sparidae family, is an economically important fish species. However, the genomic studies on S. hasta are limited due to the absence of its complete genome. The goal of the current study was to sequence, assemble, and annotate the genome of S. hasta that will fuel further research related to this seabream. The assembled draft genome of S. hasta was 686 Mb with an N50 of 80 Kb. The draft genome contained approximately 22% repeats, and 41,201 genes coding for 44,555 transcripts. Furthermore, the assessment of the assembly completeness was estimated based on the detection of ∼93% BUSCOs at the protein level and alignment of >99% of the filtered reads to the assembled genome. Around 68% of the predicted proteins (n = 30,545) had significant BLAST matches, and 30,473 and 13,244 sequences were mapped to Gene Ontology annotations and different enzyme classes, respectively. The comparative genomics analysis indicated S. hasta to be closely related to Acanthopagrus latus. The current assembly provides a solid foundation for future population and conservation studies of S. hasta as well as for investigations of environmental adaptation in Sparidae family of fishes. Value of the Data: This draft genome of S. hasta would be very applicable for molecular characterization, gene expression studies, and to address various problems associated with pathogen-associated immune response, climate adaptability, and comparative genomics. The accessibility of the draft genome sequence would be useful in understanding the pathways and functions at the molecular level, which may further help in improving the economic value and their conservation.
Collapse
Affiliation(s)
- Qusaie Karam
- Crises Management and Decision Support Program, Environment and Life Sciences Research Center, Kuwait Institute for Scientific Research, Kuwait City, Kuwait
| | - Vinod Kumar
- Biotechnology Program, Environment and Life Sciences Research Center, Kuwait Institute ForScientific Research, Kuwait City, Kuwait
| | - Anisha B. Shajan
- Biotechnology Program, Environment and Life Sciences Research Center, Kuwait Institute ForScientific Research, Kuwait City, Kuwait
| | - Sabeeka Al-Nuaimi
- Crises Management and Decision Support Program, Environment and Life Sciences Research Center, Kuwait Institute for Scientific Research, Kuwait City, Kuwait
| | - Zainab Sattari
- Aquaculture Program, Environment and Life Sciences Research Center, Kuwait Institute ForScientific Research, Kuwait City, Kuwait
| | - Saleem El-Dakour
- Aquaculture Program, Environment and Life Sciences Research Center, Kuwait Institute ForScientific Research, Kuwait City, Kuwait
| |
Collapse
|
6
|
Genome sequence assembly algorithms and misassembly identification methods. Mol Biol Rep 2022; 49:11133-11148. [PMID: 36151399 DOI: 10.1007/s11033-022-07919-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 09/05/2022] [Indexed: 10/14/2022]
Abstract
The sequence assembly algorithms have rapidly evolved with the vigorous growth of genome sequencing technology over the past two decades. Assembly mainly uses the iterative expansion of overlap relationships between sequences to construct the target genome. The assembly algorithms can be typically classified into several categories, such as the Greedy strategy, Overlap-Layout-Consensus (OLC) strategy, and de Bruijn graph (DBG) strategy. In particular, due to the rapid development of third-generation sequencing (TGS) technology, some prevalent assembly algorithms have been proposed to generate high-quality chromosome-level assemblies. However, due to the genome complexity, the length of short reads, and the high error rate of long reads, contigs produced by assembly may contain misassemblies adversely affecting downstream data analysis. Therefore, several read-based and reference-based methods for misassembly identification have been developed to improve assembly quality. This work primarily reviewed the development of DNA sequencing technologies and summarized sequencing data simulation methods, sequencing error correction methods, various mainstream sequence assembly algorithms, and misassembly identification methods. A large amount of computation makes the sequence assembly problem more challenging, and therefore, it is necessary to develop more efficient and accurate assembly algorithms and alternative algorithms.
Collapse
|
7
|
Porrelli S, Gerbault-Seureau M, Rozzi R, Chikhi R, Curaudeau M, Ropiquet A, Hassanin A. Draft genome of the lowland anoa ( Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina). G3 GENES|GENOMES|GENETICS 2022; 12:6701968. [PMID: 36111873 PMCID: PMC9635665 DOI: 10.1093/g3journal/jkac234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 08/11/2022] [Indexed: 11/24/2022]
Abstract
Genomic data for wild species of the genus Bubalus (Asian buffaloes) are still lacking while several whole genomes are currently available for domestic water buffaloes. To address this, we sequenced the genome of a wild endangered dwarf buffalo, the lowland anoa (Bubalus depressicornis), produced a draft genome assembly and made comparison to published buffalo genomes. The lowland anoa genome assembly was 2.56 Gbp long and contained 103,135 contigs, the longest contig being 337.39 kbp long. N50 and L50 values were 38.73 and 19.83 kbp, respectively, mean coverage was 44× and GC content was 41.74%. Two strategies were adopted to evaluate genome completeness: (1) determination of genomic features with de novo and homology-based predictions using annotations of chromosome-level genome assembly of the river buffalo and (2) employment of benchmarking against universal single-copy orthologs (BUSCO). Homology-based predictions identified 94.51% complete and 3.65% partial genomic features. De novo gene predictions identified 32,393 genes, representing 97.14% of the reference’s annotated genes, whilst BUSCO search against the mammalian orthologs database identified 71.1% complete, 11.7% fragmented, and 17.2% missing orthologs, indicating a good level of completeness for downstream analyses. Repeat analyses indicated that the lowland anoa genome contains 42.12% of repetitive regions. The genome assembly of the lowland anoa is expected to contribute to comparative genome analyses among bovid species.
Collapse
Affiliation(s)
- Stefano Porrelli
- Department of Natural Sciences, Faculty of Science and Technology, Middlesex University , London NW4 4BT, UK
| | - Michèle Gerbault-Seureau
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE , UA, 75005 Paris, France
| | - Roberto Rozzi
- Museum für Naturkunde, Leibniz-Institut für Evolutions- und Biodiversitätsforschung , 10115 Berlin, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig , Germany
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics , 75015 Paris, France
| | - Manon Curaudeau
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE , UA, 75005 Paris, France
| | - Anne Ropiquet
- Department of Natural Sciences, Faculty of Science and Technology, Middlesex University , London NW4 4BT, UK
| | - Alexandre Hassanin
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE , UA, 75005 Paris, France
| |
Collapse
|
8
|
Proteotranscriptomics - A facilitator in omics research. Comput Struct Biotechnol J 2022; 20:3667-3675. [PMID: 35891789 PMCID: PMC9293588 DOI: 10.1016/j.csbj.2022.07.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 07/04/2022] [Accepted: 07/04/2022] [Indexed: 11/26/2022] Open
Abstract
Applications in omics research, such as comparative transcriptomics and proteomics, require the knowledge of the species-specific gene sequence and benefit from a comprehensive high-quality annotation of the coding genes to achieve high coverage. While protein-coding genes can in simple cases be detected by scanning the genome for open reading frames, in more complex genomes exonic sequences are separated by introns. Despite advances in sequencing technologies that allow for ever-growing numbers of genomes, the quality of many of the provided genome assemblies do not reach reference quality. These non-contiguous assemblies with gaps and the necessity to predict splice sites limit accurate gene annotation from solely genomic data. In contrast, the transcriptome only contains transcribed gene regions, is devoid of introns and thus provides the optimal basis for the identification of open reading frames. The additional integration of proteomics data to validate predicted protein-coding genes further enriches for accurate gene models. This review outlines the principles of the proteotranscriptomics approach, discusses common challenges and suggests methods for improvement.
Collapse
|
9
|
Khan AL, Al-Harrasi A, Wang JP, Asaf S, Riethoven JJM, Shehzad T, Liew CS, Song XM, Schachtman DP, Liu C, Yu JG, Zhang ZK, Meng FB, Yuan JQ, Wei CD, Guo H, Wang X, Al-Rawahi A, Lee IJ, Bennetzen JL, Wang XY. Genome structure and evolutionary history of frankincense producing Boswellia sacra. iScience 2022; 25:104574. [PMID: 35789857 PMCID: PMC9249616 DOI: 10.1016/j.isci.2022.104574] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 03/01/2022] [Accepted: 06/07/2022] [Indexed: 12/20/2022] Open
Abstract
Boswellia sacra Flueck (family Burseraceae) tree is wounded to produce frankincense. We report its de novo assembled genome (667.8 Mb) comprising 18,564 high-confidence protein-encoding genes. Comparing conserved single-copy genes across eudicots suggest >97% gene space assembly of B. sacra genome. Evolutionary history shows B. sacra gene-duplications derived from recent paralogous events and retained from ancient hexaploidy shared with other eudicots. The genome indicated a major expansion of Gypsy retroelements in last 2 million years. The B. sacra genetic diversity showed four clades intermixed with a primary genotype—dominating most resin-productive trees. Further, the stem transcriptome revealed that wounding concurrently activates phytohormones signaling, cell wall fortification, and resin terpenoid biosynthesis pathways leading to the synthesis of boswellic acid—a key chemotaxonomic marker of Boswellia. The sequence datasets reported here will serve as a foundation to investigate the genetic determinants of frankincense and other resin-producing species in Burseraceae. Assembly and architecture of frankincense producing Boswellia sacra Flueck Comparative genomics and evolutionary history of frankincense tree within orders Transcriptome of stem part and gene expression patterns of wounding to the tree Resin biosynthesis pathway and related CYP450 enzymes and gene families
Collapse
|
10
|
Cuevas-Caballé C, Ferrer Obiol J, Vizueta J, Genovart M, Gonzalez-Solís J, Riutort M, Rozas J. The First Genome of the Balearic Shearwater (Puffinus mauretanicus) Provides a Valuable Resource for Conservation Genomics and Sheds Light on Adaptation to a Pelagic lifestyle. Genome Biol Evol 2022; 14:evac067. [PMID: 35524941 PMCID: PMC9117697 DOI: 10.1093/gbe/evac067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2022] [Indexed: 11/27/2022] Open
Abstract
The Balearic shearwater (Puffinus mauretanicus) is the most threatened seabird in Europe and a member of the most speciose group of pelagic seabirds, the order Procellariiformes, which exhibit extreme adaptations to a pelagic lifestyle. The fossil record suggests that human colonisation of the Balearic Islands resulted in a sharp decrease of the Balearic shearwater population size. Currently, populations of the species continue to be decimated mainly due to predation by introduced mammals and bycatch in longline fisheries, with some studies predicting its extinction by 2070. Here, using a combination of short and long reads, we generate the first high-quality reference genome for the Balearic shearwater, with a completeness amongst the highest across available avian species. We used this reference genome to study critical aspects relevant to the conservation status of the species and to gain insights into the adaptation to a pelagic lifestyle of the order Procellariiformes. We detected relatively high levels of genome-wide heterozygosity in the Balearic shearwater despite its reduced population size. However, the reconstruction of its historical demography uncovered an abrupt population decline potentially linked to a reduction of the neritic zone during the Penultimate Glacial Period (∼194-135 ka). Comparative genomics analyses uncover a set of candidate genes that may have played an important role into the adaptation to a pelagic lifestyle of Procellariiformes, including those for the enhancement of fishing capabilities, night vision, and the development of natriuresis. The reference genome obtained will be the crucial in the future development of genetic tools in conservation efforts for this Critically Endangered species.
Collapse
Affiliation(s)
- Cristian Cuevas-Caballé
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Spain
| | - Joan Ferrer Obiol
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Spain
- Department of Environmental Science and Policy, Università degli Studi di Milano (UniMi), Milan, Italy
| | - Joel Vizueta
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Spain
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Meritxell Genovart
- Mediterranean Institute for Advanced Studies (IMEDEA), CSIC-UIB & Centre for Advanced Studies of Blanes (CEAB), CSIC, Esporles, Spain
| | - Jacob Gonzalez-Solís
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals, Facultat de Biologia & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Marta Riutort
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Spain
| |
Collapse
|
11
|
Whitacre LK, Wildhaber ML, Johnson GS, Durbin HJ, Rowan TN, Tribe P, Schnabel RD, Mhlanga-Mutangadura T, Tabor VM, Fenner D, Decker JE. Exploring genetic variation and population structure in a threatened species, Noturus placidus, with whole-genome sequence data. G3 (BETHESDA, MD.) 2022; 12:jkac046. [PMID: 35188205 PMCID: PMC8982419 DOI: 10.1093/g3journal/jkac046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/06/2022] [Indexed: 06/14/2023]
Abstract
The Neosho madtom (Noturus placidus) is a small catfish, generally less than 3 inches in length, unique to the Neosho-Spring River system within the Arkansas River Basin. It was federally listed as threatened in 1990, largely due to habitat loss. For conservation efforts, we generated whole-genome sequence data from 10 Neosho madtom individuals originating from 3 geographically separated populations to evaluate genetic diversity and population structure. A Neosho madtom genome was de novo assembled, and genome size and content were assessed. Single nucleotide polymorphisms were assessed from de Bruijn graphs, and via reference alignment with both the channel catfish (Ictalurus punctatus) reference genome and Neosho madtom reference genome. Principal component analysis and structure analysis indicated weak population structure, suggesting fish from the 3 locations represent a single population. Using a novel method, genome-wide conservation and divergence between the Neosho madtom, channel catfish, and zebrafish (Danio rerio) was assessed by pairwise contig alignment, which demonstrated that genes important to embryonic development frequently had conserved sequences. This research in a threatened species with no previously published genomic resources provides novel genetic information to guide current and future conservation efforts and demonstrates that using whole-genome sequencing provides detailed information of population structure and demography using only a limited number of rare and valuable samples.
Collapse
Affiliation(s)
- Lynsey K Whitacre
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
- Division of Animal Sciences , University of Missouri, Columbia, MO 65211, USA
| | - Mark L Wildhaber
- U.S. Geological Survey, Columbia Environmental Research Center, Columbia, MO 65201, USA
| | - Gary S Johnson
- Department of Veterinary Pathobiology, College of Veterinary Medicine, University of Missouri, Columbia, MO 65211, USA
| | - Harly J Durbin
- Division of Animal Sciences , University of Missouri, Columbia, MO 65211, USA
| | - Troy N Rowan
- Division of Animal Sciences , University of Missouri, Columbia, MO 65211, USA
| | - Peoria Tribe
- The Peoria Tribe of Indians of Oklahoma, Miami, OK 74354, USA
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
- Division of Animal Sciences , University of Missouri, Columbia, MO 65211, USA
| | - Tendai Mhlanga-Mutangadura
- Department of Veterinary Pathobiology, College of Veterinary Medicine, University of Missouri, Columbia, MO 65211, USA
| | - Vernon M Tabor
- U.S. Fish and Wildlife Service, Kansas Ecological Services Field Office, Manhattan, KS 66502, USA
| | - Daniel Fenner
- U.S. Fish and Wildlife Service, Oklahoma Ecological Services Field Office, Tulsa, OK 74129, USA
| | - Jared E Decker
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
- Division of Animal Sciences , University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
12
|
Schultz CJ, Wu Y, Baumann U. A targeted bioinformatics approach identifies highly variable cell surface proteins that are unique to Glomeromycotina. MYCORRHIZA 2022; 32:45-66. [PMID: 35031894 PMCID: PMC8786786 DOI: 10.1007/s00572-021-01066-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/24/2021] [Indexed: 06/14/2023]
Abstract
Diversity in arbuscular mycorrhizal fungi (AMF) contributes to biodiversity and resilience in natural environments and healthy agricultural systems. Functional complementarity exists among species of AMF in symbiosis with their plant hosts, but the molecular basis of this is not known. We hypothesise this is in part due to the difficulties that current sequence assembly methodologies have assembling sequences for intrinsically disordered proteins (IDPs) due to their low sequence complexity. IDPs are potential candidates for functional complementarity because they often exist as extended (non-globular) proteins providing additional amino acids for molecular interactions. Rhizophagus irregularis arabinogalactan-protein-like proteins (AGLs) are small secreted IDPs with no known orthologues in AMF or other fungi. We developed a targeted bioinformatics approach to identify highly variable AGLs/IDPs in RNA-sequence datasets. The approach includes a modified multiple k-mer assembly approach (Oases) to identify candidate sequences, followed by targeted sequence capture and assembly (mirabait-mira). All AMF species analysed, including the ancestral family Paraglomeraceae, have small families of proteins rich in disorder promoting amino acids such as proline and glycine, or glycine and asparagine. Glycine- and asparagine-rich proteins also were found in Geosiphon pyriformis (an obligate symbiont of a cyanobacterium), from the same subphylum (Glomeromycotina) as AMF. The sequence diversity of AGLs likely translates to functional diversity, based on predicted physical properties of tandem repeats (elastic, amyloid, or interchangeable) and their broad pI ranges. We envisage that AGLs/IDPs could contribute to functional complementarity in AMF through processes such as self-recognition, retention of nutrients, soil stability, and water movement.
Collapse
Affiliation(s)
- Carolyn J Schultz
- School of Agriculture, Food, and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia.
| | - Yue Wu
- School of Agriculture, Food, and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Ute Baumann
- School of Agriculture, Food, and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
13
|
Miller CD, Forthman M, Miller CW, Kimball RT. Extracting ‘legacy loci’ from an invertebrate sequence capture data set. ZOOL SCR 2021. [DOI: 10.1111/zsc.12513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Caroline D. Miller
- Department of Entomology & Nematology University of Florida Gainesville FL USA
| | - Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Christine W. Miller
- Department of Entomology & Nematology University of Florida Gainesville FL USA
| | | |
Collapse
|
14
|
Heo Y, Manikandan G, Ramachandran A, Chen D. Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
15
|
Schultz CJ, Goonetilleke SN, Liang J, Lahnstein J, Levin KA, Bianco-Miotto T, Burton RA, Mather DE, Chalmers KJ. Analysis of Genetic Diversity in the Traditional Chinese Medicine Plant 'Kushen' ( Sophora flavescens Ait.). FRONTIERS IN PLANT SCIENCE 2021; 12:704201. [PMID: 34413868 PMCID: PMC8369264 DOI: 10.3389/fpls.2021.704201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/14/2021] [Indexed: 05/13/2023]
Abstract
Kushen root, from the woody legume Sophora flavescens, is a traditional Chinese medicine that is a key ingredient in several promising cancer treatments. This activity is attributed in part to two quinolizidine alkaloids (QAs), oxymatrine and matrine, that have a variety of therapeutic activities in vitro. Genetic selection is needed to adapt S. flavescens for cultivation and to improve productivity and product quality. Genetic diversity of S. flavescens was investigated using genotyping-by-sequencing (GBS) on 85 plants grown from seeds collected from 9 provinces of China. DArTSeq provided over 10,000 single nucleotide polymorphism (SNP) markers, 1636 of which were used in phylogenetic analysis to reveal clear regional differences for S. flavescens. One accession from each region was selected for PCR-sequencing to identify gene-specific SNPs in the first two QA pathway genes, lysine decarboxylase (LDC) and copper amine oxidase (CAO). To obtain SfCAO sequence for primer design we used a targeted transcript capture and assembly strategy using publicly available RNA sequencing data. Partial gene sequence analysis of SfCAO revealed two recently duplicated genes, SfCAO1 and SfCAO2, in contrast to the single gene found in the QA-producing legume Lupinus angustifolius. We demonstrate high efficiency converting SNPs to Kompetitive Allele Specific PCR (KASP) markers developing 27 new KASP markers, 17 from DArTSeq data, 7 for SfLDC, and 3 for SfCAO1. To complement this genetic diversity analysis a field trial site has been established in South Australia, providing access to diverse S. flavescens material for morphological, transcriptomic, and QA metabolite analysis. Analysis of dissected flower buds revealed that anthesis occurs before buds fully open suggesting a potential for S. flavescens to be an inbreeding species, however this is not supported by the relatively high level of heterozygosity observed. Two plants from the field trial site were analysed by quantitative real-time PCR and levels of matrine and oxymatrine were assessed in a variety of tissues. We are now in a strong position to select diverse plants for crosses to accelerate the process of genetic selection needed to adapt kushen to cultivation and improve productivity and product quality.
Collapse
Affiliation(s)
- Carolyn J. Schultz
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Shashi N. Goonetilleke
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Jianping Liang
- Department of Chinese Medicine, College of Life Sciences, Shanxi Agricultural University, Shanxi, China
- *Correspondence: Jianping Liang,
| | - Jelle Lahnstein
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Kara A. Levin
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Tina Bianco-Miotto
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Rachel A. Burton
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Diane E. Mather
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Kenneth J. Chalmers
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA, Australia
- Kenneth J. Chalmers,
| |
Collapse
|
16
|
Abstract
Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as ‘trimming’. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from Escherichia coli, Mycobacterium tuberculosis and Staphylococcus aureus were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
17
|
Andrade CM, Fleckenstein H, Thomson-Luque R, Doumbo S, Lima NF, Anderson C, Hibbert J, Hopp CS, Tran TM, Li S, Niangaly M, Cisse H, Doumtabe D, Skinner J, Sturdevant D, Ricklefs S, Virtaneva K, Asghar M, Homann MV, Turner L, Martins J, Allman EL, N'Dri ME, Winkler V, Llinás M, Lavazec C, Martens C, Färnert A, Kayentao K, Ongoiba A, Lavstsen T, Osório NS, Otto TD, Recker M, Traore B, Crompton PD, Portugal S. Increased circulation time of Plasmodium falciparum underlies persistent asymptomatic infection in the dry season. Nat Med 2020; 26:1929-1940. [PMID: 33106664 DOI: 10.1038/s41591-020-1084-0] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 08/27/2020] [Indexed: 12/25/2022]
Abstract
The dry season is a major challenge for Plasmodium falciparum parasites in many malaria endemic regions, where water availability limits mosquito vectors to only part of the year. How P. falciparum bridges two transmission seasons months apart, without being cleared by the human host or compromising host survival, is poorly understood. Here we show that low levels of P. falciparum parasites persist in the blood of asymptomatic Malian individuals during the 5- to 6-month dry season, rarely causing symptoms and minimally affecting the host immune response. Parasites isolated during the dry season are transcriptionally distinct from those of individuals with febrile malaria in the transmission season, coinciding with longer circulation within each replicative cycle of parasitized erythrocytes without adhering to the vascular endothelium. Low parasite levels during the dry season are not due to impaired replication but rather to increased splenic clearance of longer-circulating infected erythrocytes, which likely maintain parasitemias below clinical and immunological radar. We propose that P. falciparum virulence in areas of seasonal malaria transmission is regulated so that the parasite decreases its endothelial binding capacity, allowing increased splenic clearance and enabling several months of subclinical parasite persistence.
Collapse
Affiliation(s)
- Carolina M Andrade
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany
| | - Hannah Fleckenstein
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany
| | - Richard Thomson-Luque
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany
| | - Safiatou Doumbo
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Nathalia F Lima
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany
| | - Carrie Anderson
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany
| | - Julia Hibbert
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany
| | - Christine S Hopp
- Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, USA
| | - Tuan M Tran
- Division of Infectious Diseases, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shanping Li
- Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, USA
| | - Moussa Niangaly
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Hamidou Cisse
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Didier Doumtabe
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Jeff Skinner
- Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, USA
| | - Dan Sturdevant
- Rocky Mountain Laboratory Research Technologies Section, Genomics Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Stacy Ricklefs
- Rocky Mountain Laboratory Research Technologies Section, Genomics Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Kimmo Virtaneva
- Rocky Mountain Laboratory Research Technologies Section, Genomics Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Muhammad Asghar
- Department of Medicine Solna, Division of Infectious Diseases, Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Manijeh Vafa Homann
- Department of Medicine Solna, Division of Infectious Diseases, Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Louise Turner
- Department of Immunology and Microbiology, Centre for Medical Parasitology, Faculty of Health and Medical Sciences, University of Copenhagen, København N, Denmark.,Department of Infectious Diseases, Copenhagen University Hospital (Rigshospitalet), Copenhagen, Denmark
| | - Joana Martins
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Portugal and ICVS/3B's -PT Government Associate Laboratory, Braga, Portugal
| | - Erik L Allman
- Department of Biochemistry and Molecular Biology, Huck Center for Malaria Research, The Pennsylvania State University, State College, PA, USA
| | | | - Volker Winkler
- Institute of Global Health, Heidelberg University Hospital, Heidelberg, Germany
| | - Manuel Llinás
- Department of Biochemistry and Molecular Biology, Huck Center for Malaria Research, The Pennsylvania State University, State College, PA, USA.,Department of Chemistry, The Pennsylvania State University, State College, PA, USA
| | | | - Craig Martens
- Rocky Mountain Laboratory Research Technologies Section, Genomics Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Anna Färnert
- Department of Medicine Solna, Division of Infectious Diseases, Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Kassoum Kayentao
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Aissata Ongoiba
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Thomas Lavstsen
- Department of Immunology and Microbiology, Centre for Medical Parasitology, Faculty of Health and Medical Sciences, University of Copenhagen, København N, Denmark.,Department of Infectious Diseases, Copenhagen University Hospital (Rigshospitalet), Copenhagen, Denmark
| | - Nuno S Osório
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Portugal and ICVS/3B's -PT Government Associate Laboratory, Braga, Portugal
| | - Thomas D Otto
- Institute of Infection, Immunity & Inflammation, MVLS, University of Glasgow, Glasgow, UK
| | - Mario Recker
- Centre for Mathematics & the Environment, University of Exeter, Penryn Campus, Penryn, UK
| | - Boubacar Traore
- Mali International Center of Excellence in Research, University of Sciences, Techniques and Technologies of Bamako, Bamako, Mali
| | - Peter D Crompton
- Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, USA
| | - Silvia Portugal
- Center for Infectious Diseases, Parasitology, Heidelberg University Hospital, Heidelberg, Germany. .,German Center for Infection Research (DZIF), Heidelberg, Heidelberg, Germany. .,Max Planck Institute for Infection Biology, Berlin, Germany.
| |
Collapse
|
18
|
Yurchenko AA, Recknagel H, Elmer KR. Chromosome-Level Assembly of the Common Lizard (Zootoca vivipara) Genome. Genome Biol Evol 2020; 12:1953-1960. [PMID: 32835354 PMCID: PMC7643610 DOI: 10.1093/gbe/evaa161] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2020] [Indexed: 01/01/2023] Open
Abstract
Squamate reptiles exhibit high variation in their phenotypic traits and geographical distributions and are therefore fascinating taxa for evolutionary and ecological research. However, genomic resources are very limited for this group of species, consequently inhibiting research efforts. To address this gap, we assembled a high-quality genome of the common lizard, Zootoca vivipara (Lacertidae), using a combination of high coverage Illumina (shotgun and mate-pair) and PacBio sequencing data, coupled with RNAseq data and genetic linkage map generation. The 1.46-Gb genome assembly has a scaffold N50 of 11.52 Mb with N50 contig size of 220.4 kb and only 2.96% gaps. A BUSCO analysis indicates that 97.7% of the single-copy Tetrapoda orthologs were recovered in the assembly. In total, 19,829 gene models were annotated to the genome using a combination of ab initio and homology-based methods. To improve the chromosome-level assembly, we generated a high-density linkage map from wild-caught families and developed a novel analytical pipeline to accommodate multiple paternity and unknown father genotypes. We successfully anchored and oriented almost 90% of the genome on 19 linkage groups. This annotated and oriented chromosome-level reference genome represents a valuable resource to facilitate evolutionary studies in squamate reptiles.
Collapse
Affiliation(s)
- Andrey A Yurchenko
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, United Kingdom
| | - Hans Recknagel
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, United Kingdom
| | - Kathryn R Elmer
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, United Kingdom
| |
Collapse
|
19
|
Bradshaw WJ, Valenzano DR. Extreme genomic volatility characterizes the evolution of the immunoglobulin heavy chain locus in cyprinodontiform fishes. Proc Biol Sci 2020; 287:20200489. [PMID: 32396805 PMCID: PMC7287348 DOI: 10.1098/rspb.2020.0489] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 04/14/2020] [Indexed: 12/30/2022] Open
Abstract
The evolution of the adaptive immune system has provided vertebrates with a uniquely sophisticated immune toolkit, enabling them to mount precise immune responses against a staggeringly diverse range of antigens. Like other vertebrates, teleost fishes possess a complex and functional adaptive immune system; however, our knowledge of the complex antigen-receptor genes underlying its functionality has been restricted to a small number of experimental and agricultural species, preventing systematic investigation into how these crucial gene loci evolve. Here, we analyse the genomic structure of the immunoglobulin heavy chain (IGH) gene loci in the cyprinodontiforms, a diverse and important group of teleosts present in many different habitats across the world. We reconstruct the complete IGH loci of the turquoise killifish (Nothobranchius furzeri) and the southern platyfish (Xiphophorus maculatus) and analyse their in vivo gene expression, revealing the presence of species-specific splice isoforms of transmembrane IGHM. We further characterize the IGH constant regions of 10 additional cyprinodontiform species, including guppy, Amazon molly, mummichog and mangrove killifish. Phylogenetic analysis of these constant regions suggests multiple independent rounds of duplication and deletion of the teleost-specific antibody class IGHZ in the cyprinodontiform lineage, demonstrating the extreme volatility of IGH evolution. Focusing on the cyprinodontiforms as a model taxon for comparative evolutionary immunology, this work provides novel genomic resources for studying adaptive immunity and sheds light on the evolutionary history of the adaptive immune system.
Collapse
Affiliation(s)
- William J. Bradshaw
- Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Str. 296, 50937 Cologne, Germany
- CECAD Research Center, University of Cologne, Joseph-Stelzmann-Str. 26, 50937 Cologne, Germany
| | - Dario Riccardo Valenzano
- Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Str. 296, 50937 Cologne, Germany
- CECAD Research Center, University of Cologne, Joseph-Stelzmann-Str. 26, 50937 Cologne, Germany
| |
Collapse
|
20
|
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses. G3-GENES GENOMES GENETICS 2020; 10:1443-1455. [PMID: 32220952 PMCID: PMC7202002 DOI: 10.1534/g3.119.400959] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5-15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.
Collapse
|
21
|
Emberts Z, St Mary CM, Howard CC, Forthman M, Bateman PW, Somjee U, Hwang WS, Li D, Kimball RT, Miller CW. The evolution of autotomy in leaf-footed bugs. Evolution 2020; 74:897-910. [PMID: 32267543 PMCID: PMC7317576 DOI: 10.1111/evo.13948] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 02/24/2020] [Indexed: 01/04/2023]
Abstract
Sacrificing body parts is one of many behaviors that animals use to escape predation. This trait, termed autotomy, is classically associated with lizards. However, several other taxa also autotomize, and this trait has independently evolved multiple times throughout Animalia. Despite having multiple origins and being an iconic antipredatory trait, much remains unknown about the evolution of autotomy. Here, we combine morphological, behavioral, and genomic data to investigate the evolution of autotomy within leaf-footed bugs and allies (Insecta: Hemiptera: Coreidae + Alydidae). We found that the ancestor of leaf-footed bugs autotomized and did so slowly; rapid autotomy (<2 min) then arose multiple times. The ancestor likely used slow autotomy to reduce the cost of injury or to escape nonpredatory entrapment but could not use autotomy to escape predation. This result suggests that autotomy to escape predation is a co-opted benefit (i.e., exaptation), revealing one way that sacrificing a limb to escape predation may arise. In addition to identifying the origins of rapid autotomy, we also show that across species variation in the rates of autotomy can be explained by body size, distance from the equator, and enlargement of the autotomizable appendage.
Collapse
Affiliation(s)
- Zachary Emberts
- Department of Biology, University of Florida, Gainesville, Florida, 32611
| | - Colette M St Mary
- Department of Biology, University of Florida, Gainesville, Florida, 32611
| | - Cody Coyotee Howard
- Department of Biology, University of Florida, Gainesville, Florida, 32611.,Florida Museum of Natural History, University of Florida, Gainesville, Florida, 32611
| | - Michael Forthman
- Entomology and Nematology Department, University of Florida, Gainesville, Florida, 32611
| | - Philip W Bateman
- Behavioural Ecology Lab, School of Molecular and Life Sciences, Curtin University, Perth, WA, 6845, Australia
| | - Ummat Somjee
- Smithsonian Tropical Research Institute, Balboa, Panama
| | - Wei Song Hwang
- Lee Kong Chian Natural History Museum, National University of Singapore, Singapore, 117377, Singapore
| | - Daiqin Li
- Department of Biological Science, National University of Singapore, Singapore, 117543, Singapore
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, Florida, 32611
| | - Christine W Miller
- Entomology and Nematology Department, University of Florida, Gainesville, Florida, 32611
| |
Collapse
|
22
|
Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 2019; 20:278. [PMID: 31842956 PMCID: PMC6912988 DOI: 10.1186/s13059-019-1910-1] [Citation(s) in RCA: 1015] [Impact Index Per Article: 169.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/02/2019] [Indexed: 11/13/2022] Open
Abstract
RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Aleksey V. Zimin
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Geo M. Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Roham Razaghi
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Mihaela Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
23
|
Morisse P, Lecroq T, Lefebvre A. Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph. Bioinformatics 2019; 34:4213-4222. [PMID: 29955770 DOI: 10.1093/bioinformatics/bty521] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 06/27/2018] [Indexed: 12/31/2022] Open
Abstract
Motivation The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows to solve assembly problems for larger and more complex genomes than what allowed short reads technologies. However, these long reads are very noisy, reaching an error rate of around 10-15% for Pacific Biosciences, and up to 30% for Oxford Nanopore. The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach. However, even though sequencing technologies promise to lower the error rate of the long reads below 10%, it is still higher in practice, and correcting such noisy long reads remains an issue. Results We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads. Our experiments show that HG-CoLoR manages to efficiently correct highly noisy long reads that display an error rate as high as 44%. When compared to other state-of-the-art long read error correction methods, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes. Availability and implementation HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at https://github.com/morispi/HG-CoLoR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
24
|
Kadobianskyi M, Schulze L, Schuelke M, Judkewitz B. Hybrid genome assembly and annotation of Danionella translucida. Sci Data 2019; 6:156. [PMID: 31451709 PMCID: PMC6710283 DOI: 10.1038/s41597-019-0161-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 07/26/2019] [Indexed: 11/09/2022] Open
Abstract
Studying neuronal circuits at cellular resolution is very challenging in vertebrates due to the size and optical turbidity of their brains. Danionella translucida, a close relative of zebrafish, was recently introduced as a model organism for investigating neural network interactions in adult individuals. Danionella remains transparent throughout its life, has the smallest known vertebrate brain and possesses a rich repertoire of complex behaviours. Here we sequenced, assembled and annotated the Danionella translucida genome employing a hybrid Illumina/Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA. We achieved high assembly continuity using low-coverage long-read data and annotated a large fraction of the transcriptome. This dataset will pave the way for molecular research and targeted genetic manipulation of this novel model organism.
Collapse
Affiliation(s)
- Mykola Kadobianskyi
- Einstein Center for Neurosciences, NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - Lisanne Schulze
- Einstein Center for Neurosciences, NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - Markus Schuelke
- Einstein Center for Neurosciences, NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.
| | - Benjamin Judkewitz
- Einstein Center for Neurosciences, NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.
| |
Collapse
|
25
|
Whole Genome Sequencing and Re-sequencing of the Sable Antelope ( Hippotragus niger): A Resource for Monitoring Diversity in ex Situ and in Situ Populations. G3-GENES GENOMES GENETICS 2019; 9:1785-1793. [PMID: 31000506 PMCID: PMC6553546 DOI: 10.1534/g3.119.400084] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Genome-wide assessment of genetic diversity has the potential to increase the ability to understand admixture, inbreeding, kinship and erosion of genetic diversity affecting both captive (ex situ) and wild (in situ) populations of threatened species. The sable antelope (Hippotragus niger), native to the savannah woodlands of sub-Saharan Africa, is a species that is being managed ex situ in both public (zoo) and private (ranch) collections in the United States. Our objective was to develop whole genome sequence resources that will serve as a foundation for characterizing the genetic status of ex situ populations of sable antelope relative to populations in the wild. Here we report the draft genome assembly of a male sable antelope, a member of the subfamily Hippotraginae (Bovidae, Cetartiodactyla, Mammalia). The 2.596 Gb draft genome consists of 136,528 contigs with an N50 of 45.5 Kbp and 16,927 scaffolds with an N50 of 4.59 Mbp. De novo annotation identified 18,828 protein-coding genes and repetitive sequences encompassing 46.97% of the genome. The discovery of single nucleotide variants (SNVs) was assisted by the re-sequencing of seven additional captive and wild individuals, representing two different subspecies, leading to the identification of 1,987,710 bi-allelic SNVs. Assembly of the mitochondrial genomes revealed that each individual was defined by a unique haplotype and these data were used to infer the mitochondrial gene tree relative to other hippotragine species. The sable antelope genome constitutes a valuable resource for assessing genome-wide diversity and evolutionary potential, thereby facilitating long-term conservation of this charismatic species.
Collapse
|
26
|
Heydari M, Miclotte G, Van de Peer Y, Fostier J. Illumina error correction near highly repetitive DNA regions improves de novo genome assembly. BMC Bioinformatics 2019; 20:298. [PMID: 31159722 PMCID: PMC6545690 DOI: 10.1186/s12859-019-2906-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 05/17/2019] [Indexed: 11/10/2022] Open
Abstract
Background Several standalone error correction tools have been proposed to correct sequencing errors in Illumina data in order to facilitate de novo genome assembly. However, in a recent survey, we showed that state-of-the-art assemblers often did not benefit from this pre-correction step. We found that many error correction tools introduce new errors in reads that overlap highly repetitive DNA regions such as low-complexity patterns or short homopolymers, ultimately leading to a more fragmented assembly. Results We propose BrownieCorrector, an error correction tool for Illumina sequencing data that focuses on the correction of only those reads that overlap short DNA patterns that are highly repetitive in the genome. BrownieCorrector extracts all reads that contain such a pattern and clusters them into different groups using a community detection algorithm that takes into account both the sequence similarity between overlapping reads and their respective paired-end reads. Each cluster holds reads that originate from the same genomic region and hence each cluster can be corrected individually, thus providing a consistent correction for all reads within that cluster. Conclusions BrownieCorrector is benchmarked using six real Illumina datasets for different eukaryotic genomes. The prior use of BrownieCorrector improves assembly results over the use of uncorrected reads in all cases. In comparison with other error correction tools, BrownieCorrector leads to the best assembly results in most cases even though less than 2% of the reads within a dataset are corrected. Additionally, we investigate the impact of error correction on hybrid assembly where the corrected Illumina reads are supplemented with PacBio data. Our results confirm that BrownieCorrector improves the quality of hybrid genome assembly as well. BrownieCorrector is written in standard C++11 and released under GPL license. BrownieCorrector relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at https://github.com/biointec/browniecorrector. Electronic supplementary material The online version of this article (10.1186/s12859-019-2906-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mahdi Heydari
- Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.,Bioinformatics Institute Ghent, Ghent, B-9052, Belgium
| | - Giles Miclotte
- Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.,Bioinformatics Institute Ghent, Ghent, B-9052, Belgium
| | - Yves Van de Peer
- Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.,Center for Plant Systems Biology, VIB, Ghent, B-9052, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, B-9052, Belgium.,Department of Genetics, Genome Research Institute, University of Pretoria, Pretoria, South Africa
| | - Jan Fostier
- Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium. .,Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.
| |
Collapse
|
27
|
Forthman M, Miller CW, Kimball RT. Phylogenomic analysis suggests Coreidae and Alydidae (Hemiptera: Heteroptera) are not monophyletic. ZOOL SCR 2019. [DOI: 10.1111/zsc.12353] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Michael Forthman
- Entomology & Nematology Department University of Florida Gainesville Florida
| | - Christine W. Miller
- Entomology & Nematology Department University of Florida Gainesville Florida
| | | |
Collapse
|
28
|
Molecular characterization of Bathymodiolus mussels and gill symbionts associated with chemosynthetic habitats from the U.S. Atlantic margin. PLoS One 2019; 14:e0211616. [PMID: 30870419 PMCID: PMC6417655 DOI: 10.1371/journal.pone.0211616] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 01/17/2019] [Indexed: 01/22/2023] Open
Abstract
Mussels of the genus Bathymodiolus are among the most widespread colonizers of hydrothermal vent and cold seep environments, sustained by endosymbiosis with chemosynthetic bacteria. Presumed species of Bathymodiolus are abundant at newly discovered cold seeps on the Mid-Atlantic continental slope, however morphological taxonomy is challenging, and their phylogenetic affinities remain unestablished. Here we used mitochondrial sequence to classify species found at three seep sites (Baltimore Canyon seep (BCS; ~400m); Norfolk Canyon seep (NCS; ~1520m); and Chincoteague Island seep (CTS; ~1000m)). Mitochondrial COI (N = 162) and ND4 (N = 39) data suggest that Bathymodiolus childressi predominates at these sites, although single B. mauritanicus and B. heckerae individuals were detected. As previous work had suggested that methanotrophic and thiotrophic interactions can both occur at a site, and within an individual mussel, we investigated the symbiont communities in gill tissues of a subset of mussels from BCS and NCS. We constructed metabarcode libraries with four different primer sets spanning the 16S gene. A methanotrophic phylotype dominated all gill microbial samples from BCS, but sulfur-oxidizing Campylobacterota were represented by a notable minority of sequences from NCS. The methanotroph phylotype shared a clade with globally distributed Bathymodiolus spp. symbionts from methane seeps and hydrothermal vents. Two distinct Campylobacterota phylotypes were prevalent in NCS samples, one of which shares a clade with Campylobacterota associated with B. childressi from the Gulf of Mexico and the other with Campylobacterota associated with other deep-sea fauna. Variation in chemosynthetic symbiont communities among sites and individuals has important ecological and geochemical implications and suggests shifting reliance on methanotrophy. Continued characterization of symbionts from cold seeps will provide a greater understanding of the ecology of these unique environments as well and their geochemical footprint in elemental cycling and energy flux.
Collapse
|
29
|
Masonbrink RE, Purcell CM, Boles SE, Whitehead A, Hyde JR, Seetharam AS, Severin AJ. An Annotated Genome for Haliotis rufescens (Red Abalone) and Resequenced Green, Pink, Pinto, Black, and White Abalone Species. Genome Biol Evol 2019; 11:431-438. [PMID: 30657886 PMCID: PMC6373831 DOI: 10.1093/gbe/evz006] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2019] [Indexed: 11/13/2022] Open
Abstract
Abalone are one of the few marine taxa where aquaculture production dominates the global market as a result of increasing demand and declining natural stocks from overexploitation and disease. To better understand abalone biology, aid in conservation efforts for endangered abalone species, and gain insight into sustainable aquaculture, we created a draft genome of the red abalone (Haliotis rufescens). The approach to this genome draft included initial assembly using raw Illumina and PacBio sequencing data with MaSuRCA, before scaffolding using sequencing data generated from Chicago library preparations with HiRise2. This assembly approach resulted in 8,371 scaffolds and total length of 1.498 Gb; the N50 was 1.895 Mb, and the longest scaffold was 13.2 Mb. Gene models were predicted, using MAKER2, from RNA-Seq data and all related expressed sequence tags and proteins from NCBI; this resulted in 57,785 genes with an average length of 8,255 bp. In addition, single nucleotide polymorphisms were called on Illumina short-sequencing reads from five other eastern Pacific abalone species: the green (H. fulgens), pink (H. corrugata), pinto (H. kamtschatkana), black (H. cracherodii), and white (H. sorenseni) abalone. Phylogenetic relationships largely follow patterns detected by previous studies based on 1,784,991 high-quality single nucleotide polymorphisms. Among the six abalone species examined, the endangered white abalone appears to harbor the lowest levels of heterozygosity. This draft genome assembly and the sequencing data provide a foundation for genome-enabled aquaculture improvement for red abalone, and for genome-guided conservation efforts for the other five species and, in particular, for the endangered white and black abalone.
Collapse
Affiliation(s)
| | - Catherine M Purcell
- Ocean Associates, Inc. Under Contract to NOAA Fisheries, Southwest Fisheries Science Center, La Jolla, California
| | - Sara E Boles
- Department of Environmental Toxicology, University of California, Davis
| | - Andrew Whitehead
- Department of Environmental Toxicology, University of California, Davis
| | - John R Hyde
- NOAA Fisheries, Southwest Fisheries Science Center, La Jolla, California
| | | | | |
Collapse
|
30
|
Rane RV, Pearce SL, Li F, Coppin C, Schiffer M, Shirriffs J, Sgrò CM, Griffin PC, Zhang G, Lee SF, Hoffmann AA, Oakeshott JG. Genomic changes associated with adaptation to arid environments in cactophilic Drosophila species. BMC Genomics 2019; 20:52. [PMID: 30651071 PMCID: PMC6335815 DOI: 10.1186/s12864-018-5413-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/26/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Insights into the genetic capacities of species to adapt to future climate change can be gained by using comparative genomic and transcriptomic data to reconstruct the genetic changes associated with such adaptations in the past. Here we investigate the genetic changes associated with adaptation to arid environments, specifically climatic extremes and new cactus hosts, through such an analysis of five repleta group Drosophila species. RESULTS We find disproportionately high rates of gene gains in internal branches in the species' phylogeny where cactus use and subsequently cactus specialisation and high heat and desiccation tolerance evolved. The terminal branch leading to the most heat and desiccation resistant species, Drosophila aldrichi, also shows disproportionately high rates of both gene gains and positive selection. Several Gene Ontology terms related to metabolism were enriched in gene gain events in lineages where cactus use was evolving, while some regulatory and developmental genes were strongly selected in the Drosophila aldrichi branch. Transcriptomic analysis of flies subjected to sublethal heat shocks showed many more downregulation responses to the stress in a heat sensitive versus heat resistant species, confirming the existence of widespread regulatory as well as structural changes in the species' differing adaptations. Gene Ontology terms related to metabolism were enriched in the differentially expressed genes in the resistant species while terms related to stress response were over-represented in the sensitive one. CONCLUSION Adaptations to new cactus hosts and hot desiccating environments were associated with periods of accelerated evolutionary change in diverse biochemistries. The hundreds of genes involved suggest adaptations of this sort would be difficult to achieve in the timeframes projected for anthropogenic climate change.
Collapse
Affiliation(s)
- Rahul V. Rane
- CSIRO, Clunies Ross St, GPO Box 1700, Acton, ACT 2601 Australia
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | | | - Fang Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Chris Coppin
- CSIRO, Clunies Ross St, GPO Box 1700, Acton, ACT 2601 Australia
| | - Michele Schiffer
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Jennifer Shirriffs
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Carla M. Sgrò
- School of Biological Sciences, Monash University, Melbourne, 3800 Australia
| | - Philippa C. Griffin
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Goujie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- Centre for Social Evolution, Department of Biology, University of Copenhagen, Universitetsparken 15, København, Denmark
| | - Siu F. Lee
- CSIRO, Clunies Ross St, GPO Box 1700, Acton, ACT 2601 Australia
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Ary A. Hoffmann
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | | |
Collapse
|
31
|
Yoon S, Kim D, Kang K, Park WJ. TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics 2018; 19:653. [PMID: 30180798 PMCID: PMC6123912 DOI: 10.1186/s12864-018-5034-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 08/23/2018] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The challenges when developing a good de novo transcriptome assembler include how to deal with read errors and sequence repeats. Almost all de novo assemblers utilize a de Bruijn graph, with which complexity grows linearly with data size while suffering from errors and repeats. Although one can correct the errors by inspecting the topological structure of the graph, this is not an easy task when there are too many branches. Two research directions are to improve either the graph reliability or the path search precision, and in this study, we focused on the former. RESULTS We present TraRECo, a greedy approach to de novo assembly employing error-aware graph construction. In the proposed approach, we built contigs by direct read alignment within a distance margin and performed a junction search to construct splicing graphs. While doing so, a contig of length l was represented by a 4 × l matrix (called a consensus matrix), in which each element was the base count of the aligned reads so far. A representative sequence was obtained by taking the majority in each column of the consensus matrix to be used for further read alignment. Once the splicing graphs had been obtained, we used IsoLasso to find paths with a noticeable read depth. The experiments using real and simulated reads show that the method provided considerable improvement in sensitivity and moderately better performance when comparing sensitivity and precision. This was achieved by the error-aware graph construction using the consensus matrix, with which the reads having errors were made usable for the graph construction (otherwise, they might have been eventually discarded). This improved the quality of the coverage depth information used in the subsequent path search step and finally the reliability of the graph. CONCLUSIONS De novo assembly is mainly used to explore undiscovered isoforms and must be able to represent as many reads as possible in an efficient way. In this sense, TraRECo provides us with a potential alternative for improving graph reliability even though the computational burden is much higher than the single k-mer in the de Bruijn graph approach.
Collapse
Affiliation(s)
- Seokhyun Yoon
- Department of Electronics Eng., College of Engineering, Dankook University, Yongin-si, Korea
| | - Daeseung Kim
- Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea
| | - Keunsoo Kang
- Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea.
| | - Woong June Park
- Department of Molecular Biology, College of Natural Sciences, Dankook University, Cheonan-si, Korea
| |
Collapse
|
32
|
Williams JL, Iamartino D, Pruitt KD, Sonstegard T, Smith TPL, Low WY, Biagini T, Bomba L, Capomaccio S, Castiglioni B, Coletta A, Corrado F, Ferré F, Iannuzzi L, Lawley C, Macciotta N, McClure M, Mancini G, Matassino D, Mazza R, Milanesi M, Moioli B, Morandi N, Ramunno L, Peretti V, Pilla F, Ramelli P, Schroeder S, Strozzi F, Thibaud-Nissen F, Zicarelli L, Ajmone-Marsan P, Valentini A, Chillemi G, Zimin A. Genome assembly and transcriptome resource for river buffalo, Bubalus bubalis (2n = 50). Gigascience 2018; 6:1-6. [PMID: 29048578 PMCID: PMC5737279 DOI: 10.1093/gigascience/gix088] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/28/2017] [Indexed: 11/12/2022] Open
Abstract
Water buffalo is a globally important species for agriculture and local economies. A de novo assembled, well-annotated reference sequence for the water buffalo is an important prerequisite for studying the biology of this species, and is necessary to manage genetic diversity and to use modern breeding and genomic selection techniques. However, no such genome assembly has been previously reported. There are 2 species of domestic water buffalo, the river (2 n = 50) and the swamp (2 n = 48) buffalo. Here we describe a draft quality reference sequence for the river buffalo created from Illumina GA and Roche 454 short read sequences using the MaSuRCA assembler. The assembled sequence is 2.83 Gb, consisting of 366 983 scaffolds with a scaffold N50 of 1.41 Mb and contig N50 of 21 398 bp. Annotation of the genome was supported by transcriptome data from 30 tissues and identified 21 711 predicted protein coding genes. Searches for complete mammalian BUSCO gene groups found 98.6% of curated single copy orthologs present among predicted genes, which suggests a high level of completeness of the genome. The annotated sequence is available from NCBI at accession GCA_000471725.1.
Collapse
Affiliation(s)
- John L Williams
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, SA 5371, Australia.,Parco Tecnologico Padano, Via Einstein, 26500, Lodi, Italy
| | - Daniela Iamartino
- AIA-LGS, Associazione Italiana Allevatori, Laboratorio Genetica e Servizi, Via Bergamo 292, 26100 Cremona (CR), Italy.,Parco Tecnologico Padano, Via Einstein, 26500, Lodi, Italy
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tad Sonstegard
- Recombinetics, 1246 University Ave W, St Paul, MN 55104, USA
| | - Timothy P L Smith
- USDA-ARS U.S. Meat Animal Research Center, 844 Road 313, Clay Center, NE 68933, USA
| | - Wai Yee Low
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, SA 5371, Australia
| | - Tommaso Biagini
- IRCCS Casa Sollievo della Sofferenza, Bioinformatics Unit, S. Giovanni Rotondo, Italy
| | - Lorenzo Bomba
- Università Cattolica del Sacro Cuore, Via Emilia Parmense 84, 29122 Piacenza PC, Italy.,Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Stefano Capomaccio
- Università Cattolica del Sacro Cuore, Via Emilia Parmense 84, 29122 Piacenza PC, Italy
| | - Bianca Castiglioni
- CNR, Istituto di Biologia e Biotecnologia Agraria Via Einstein, 26900 Lodi, Italy
| | - Angelo Coletta
- ANASB Associazione Nazionale Allevatori Specie Bufalina, Centuran, Caserta, Italy
| | - Federica Corrado
- IZSM, Istituto Zooprofilattico Sperimentale del Mezzogiorno, Via Salute, 2-80055, Portici (NA), Italy
| | - Fabrizio Ferré
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna Alma Mater, Via Belmeloro 8/2, 40126 Bologna, Italy
| | - Leopoldo Iannuzzi
- CNR, Istituto Per Il Sistema Produzione Animale In Ambiente Mediterraneo, Via Argine, 1085, 80147 Napoli, Italy
| | - Cynthia Lawley
- Illumina, Inc. 499 Illinois St. Suite 210, San Francisco, CA 94158, USA
| | - Nicolò Macciotta
- Università degli Studi di Sassari, Piazza Università 21, 07100 Sassari, Italy
| | - Matthew McClure
- USDA, ARS, Animal Genomics and Improvement Laboratory, Building 306 BARC-East, Beltsville, MD 20705-2350, USA.,Irish Cattle Breeding Federation, Highfield House, Shinagh, Bandon, Co., Cork, P72 × 050, Ireland
| | - Giordano Mancini
- Scuola Normale Superiore, Piazza dei Cavalieri 7, 56125 Pisa, Italy
| | - Donato Matassino
- ConSDABI, Consorzio per la Sperimentazione, Divulgazione e Applicazione di Biotecniche Innovative, Contrada Piano Cappelle, Benevento (BN), Italy
| | - Raffaele Mazza
- AIA-LGS, Associazione Italiana Allevatori, Laboratorio Genetica e Servizi, Via Bergamo 292, 26100 Cremona (CR), Italy
| | - Marco Milanesi
- Università Cattolica del Sacro Cuore, Via Emilia Parmense 84, 29122 Piacenza PC, Italy
| | - Bianca Moioli
- CRA Centro di Ricerca per la Produzione delle Carni ed il Miglioramento Genetico, Via Salaria 31, 00015, Montorotondo, Italy
| | | | - Luigi Ramunno
- Dipartimento di Agraria, Università degli Studi di Napoli "Federico II", via Università 100, 80055 Portici (NA), Italy
| | - Vincenzo Peretti
- Department of Veterinary Medicine and Animal Production, University of Naples Federico II, via Delpino 1, 80137 Napoli, Italy
| | - Fabio Pilla
- Department of Agriculture, Environment and Food, University of Molise
| | - Paola Ramelli
- Parco Tecnologico Padano, Via Einstein, 26500, Lodi, Italy
| | - Steven Schroeder
- USDA, ARS, Animal Genomics and Improvement Laboratory, Building 306 BARC-East, Beltsville, MD 20705-2350, USA
| | - Francesco Strozzi
- Parco Tecnologico Padano, Via Einstein, 26500, Lodi, Italy.,Enterome, 94-96 Avenue Ledru-Rollin, 75011 Paris, France
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Luigi Zicarelli
- Department of Agriculture, Environment and Food, University of Molise
| | - Paolo Ajmone-Marsan
- Università Cattolica del Sacro Cuore, Via Emilia Parmense 84, 29122 Piacenza PC, Italy
| | - Alessio Valentini
- Universit à della Tuscia, Via S. Camillo de Lellis, 01100 Viterbo, Italy
| | - Giovanni Chillemi
- SCAI Super Computing Applications and Innovation Department, Cineca, Via dei Tizii 6, 00185, Rome
| | | |
Collapse
|
33
|
Grigorev K, Kliver S, Dobrynin P, Komissarov A, Wolfsberger W, Krasheninnikova K, Afanador-Hernández YM, Brandt AL, Paulino LA, Carreras R, Rodríguez LE, Núñez A, Brandt JR, Silva F, Hernández-Martich JD, Majeske AJ, Antunes A, Roca AL, O'Brien SJ, Martínez-Cruzado JC, Oleksyk TK. Innovative assembly strategy contributes to understanding the evolution and conservation genetics of the endangered Solenodon paradoxus from the island of Hispaniola. Gigascience 2018; 7:4931057. [PMID: 29718205 PMCID: PMC6009670 DOI: 10.1093/gigascience/giy025] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 01/26/2018] [Accepted: 03/07/2018] [Indexed: 11/25/2022] Open
Abstract
Solenodons are insectivores that live in Hispaniola and Cuba. They form an isolated branch in the tree of placental mammals that are highly divergent from other eulipothyplan insectivores The history, unique biology, and adaptations of these enigmatic venomous species could be illuminated by the availability of genome data. However, a whole genome assembly for solenodons has not been previously performed, partially due to the difficulty in obtaining samples from the field. Island isolation and reduced numbers have likely resulted in high homozygosity within the Hispaniolan solenodon (Solenodon paradoxus). Thus, we tested the performance of several assembly strategies on the genome of this genetically impoverished species. The string graph-based assembly strategy seemed a better choice compared to the conventional de Bruijn graph approach due to the high levels of homozygosity, which is often a hallmark of endemic or endangered species. A consensus reference genome was assembled from sequences of 5 individuals from the southern subspecies (S. p. woodi). In addition, we obtained an additional sequence from 1 sample of the northern subspecies (S. p. paradoxus). The resulting genome assemblies were compared to each other and annotated for genes, with an emphasis on venom genes, repeats, variable microsatellite loci, and other genomic variants. Phylogenetic positioning and selection signatures were inferred based on 4,416 single-copy orthologs from 10 other mammals. We estimated that solenodons diverged from other extant mammals 73.6 million years ago. Patterns of single-nucleotide polymorphism variation allowed us to infer population demography, which supported a subspecies split within the Hispaniolan solenodon at least 300 thousand years ago.
Collapse
Affiliation(s)
- Kirill Grigorev
- Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico
| | - Sergey Kliver
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia
| | - Pavel Dobrynin
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia
| | - Aleksey Komissarov
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia
| | - Walter Wolfsberger
- Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico
- Biology Department, Uzhhorod National University, Uzhhorod, Ukraine
| | - Ksenia Krasheninnikova
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia
| | | | - Adam L Brandt
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
- Division of Natural Sciences, St. Norbert College, De Pere, Wisconsin, USA
| | - Liz A Paulino
- Instituto Tecnológico de Santo Domingo (INTEC), Santo Domingo, Dominican Republic
| | - Rosanna Carreras
- Instituto Tecnológico de Santo Domingo (INTEC), Santo Domingo, Dominican Republic
| | - Luis E Rodríguez
- Instituto Tecnológico de Santo Domingo (INTEC), Santo Domingo, Dominican Republic
| | - Adrell Núñez
- Department of Conservation and Science, Parque Zoologico Nacional (ZOODOM), Santo Domingo, Dominican Republic
| | - Jessica R Brandt
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
- Department of Biology, Marian University, Fond du Lac, Wisconsin, USA
| | - Filipe Silva
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450–208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto. Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - J David Hernández-Martich
- Instituto de Investigaciones Botánicas y Zoológicas, Universidad Autónoma de Santo Domingo, Santo Domingo, Dominican Republic
| | - Audrey J Majeske
- Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450–208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto. Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Alfred L Roca
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Stephen J O'Brien
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia
- Oceanographic Center, Nova Southeastern University, Fort Lauderdale, Florida, USA
| | | | - Taras K Oleksyk
- Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico
- Biology Department, Uzhhorod National University, Uzhhorod, Ukraine
| |
Collapse
|
34
|
Abstract
A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.
Collapse
Affiliation(s)
- Kyle Fletcher
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, Davis, CA, USA
| | - Richard Michelmore
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, Davis, CA, USA.
| |
Collapse
|
35
|
Huang YT, Huang YW. An efficient error correction algorithm using FM-index. BMC Bioinformatics 2017; 18:524. [PMID: 29179672 PMCID: PMC5704532 DOI: 10.1186/s12859-017-1940-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 11/14/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing offers higher throughput and lower cost for sequencing a genome. However, sequencing errors, including mismatches and indels, may be produced during sequencing. Because, errors may reduce the accuracy of subsequent de novo assembly, error correction is necessary prior to assembly. However, existing correction methods still face trade-offs among correction power, accuracy, and speed. RESULTS We develop a novel overlap-based error correction algorithm using FM-index (called FMOE). FMOE first identifies overlapping reads by aligning a query read simultaneously against multiple reads compressed by FM-index. Subsequently, sequencing errors are corrected by k-mer voting from overlapping reads only. The experimental results indicate that FMOE has highest correction power with comparable accuracy and speed. Our algorithm performs better in long-read than short-read datasets when compared with others. The assembly results indicated different algorithms has its own strength and weakness, whereas FMOE is good for long or good-quality reads. CONCLUSIONS FMOE is freely available at https://github.com/ythuang0522/FMOC .
Collapse
Affiliation(s)
- Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Chuang Cheng University, Chiayi, Taiwan.
| | - Yu-Wen Huang
- Department of Computer Science and Information Engineering, National Chuang Cheng University, Chiayi, Taiwan
| |
Collapse
|
36
|
Hoff JL, Decker JE, Schnabel RD, Taylor JF. Candidate lethal haplotypes and causal mutations in Angus cattle. BMC Genomics 2017; 18:799. [PMID: 29047335 PMCID: PMC5648474 DOI: 10.1186/s12864-017-4196-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 10/08/2017] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND If unmanaged, high rates of inbreeding in livestock populations adversely impact their reproductive fitness. In beef cattle, historical selection strategies have increased the frequency of several segregating fatal autosomal recessive polymorphisms. Selective breeding has also decreased the extent of haplotypic diversity genome-wide. By identifying haplotypes for which homozygotes are not observed but would be expected based on their frequency, candidates for developmentally lethal recessive loci can be localized. This analysis comes without the need for observation of the loss-associated phenotype (e.g., failure to implant, first trimester abortion, deformity at birth). In this study, haplotypes were estimated for 3961 registered Angus individuals using 52,545 SNP loci using findhap v2, which exploited the complex pedigree among the individuals in this population. RESULTS Seven loci were detected to possess haplotypes that were not observed in homozygous form despite a sufficiently high frequency and pedigree-based expectation of homozygote occurrence. These haplotypes were identified as candidates for harboring autosomal recessive lethal alleles. Of the genotyped individuals, 109 were resequenced to an average 27X depth of coverage to identify putative loss-of-function alleles genome-wide and had variants called using a custom in-house developed pipeline. For the candidate lethal-harboring haplotypes present in these bulls, sequence-called genotypes were used to identify concordant variants. In addition, whole-genome sequence imputation of variants was performed into the set of 3961 genotyped animals using the 109 resequenced animals to identify candidate lethal recessive variants at the seven loci. Following the imputation, no variants were identified that were fully concordant with the marker-based diplotypes. CONCLUSIONS Selective breeding programs could utilize the predicted lethal haplotypes associated with SNP genotypes. Sequencing and other methods for identifying the causal variants underlying these haplotypes can allow for more efficient methods of management such as gene editing. These two methods in total will reduce the negative impacts of inbreeding on fertility and maximize overall genetic gains.
Collapse
Affiliation(s)
- Jesse L Hoff
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Jared E Decker
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
37
|
Sun H, Wu S, Zhang G, Jiao C, Guo S, Ren Y, Zhang J, Zhang H, Gong G, Jia Z, Zhang F, Tian J, Lucas WJ, Doyle JJ, Li H, Fei Z, Xu Y. Karyotype Stability and Unbiased Fractionation in the Paleo-Allotetraploid Cucurbita Genomes. MOLECULAR PLANT 2017; 10:1293-1306. [PMID: 28917590 DOI: 10.1016/j.molp.2017.09.003] [Citation(s) in RCA: 179] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 09/06/2017] [Accepted: 09/06/2017] [Indexed: 05/18/2023]
Abstract
The Cucurbita genus contains several economically important species in the Cucurbitaceae family. Here, we report high-quality genome sequences of C. maxima and C. moschata and provide evidence supporting an allotetraploidization event in Cucurbita. We are able to partition the genome into two homoeologous subgenomes based on different genetic distances to melon, cucumber, and watermelon in the Benincaseae tribe. We estimate that the two diploid progenitors successively diverged from Benincaseae around 31 and 26 million years ago (Mya), respectively, and the allotetraploidization happened at some point between 26 Mya and 3 Mya, the estimated date when C. maxima and C. moschata diverged. The subgenomes have largely maintained the chromosome structures of their diploid progenitors. Such long-term karyotype stability after polyploidization has not been commonly observed in plant polyploids. The two subgenomes have retained similar numbers of genes, and neither subgenome is globally dominant in gene expression. Allele-specific expression analysis in the C. maxima × C. moschata interspecific F1 hybrid and their two parents indicates the predominance of trans-regulatory effects underlying expression divergence of the parents, and detects transgressive gene expression changes in the hybrid correlated with heterosis in important agronomic traits. Our study provides insights into polyploid genome evolution and valuable resources for genetic improvement of cucurbit crops.
Collapse
Affiliation(s)
- Honghe Sun
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China; Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Shan Wu
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA.
| | - Guoyu Zhang
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Chen Jiao
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Shaogui Guo
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Yi Ren
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Jie Zhang
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Haiying Zhang
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Guoyi Gong
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Zhangcai Jia
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Fan Zhang
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Jiaxing Tian
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - William J Lucas
- Department of Plant Biology, College of Biological Sciences, University of California, Davis, CA 95616, USA
| | - Jeff J Doyle
- Section of Plant Breeding & Genetics, School of Integrated Plant Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Haizhen Li
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA; USDA-ARS Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA.
| | - Yong Xu
- National Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing 100097, China.
| |
Collapse
|
38
|
Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017; 18:186. [PMID: 28974235 PMCID: PMC5627421 DOI: 10.1186/s13059-017-1319-7] [Citation(s) in RCA: 285] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614, Poznan, Poland
| | - Susana Vinga
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
| | - Jonas Almeida
- Stony Brook University (SUNY), 101 Nicolls Road, Stony Brook, NY, 11794, USA
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614, Poznan, Poland.
| |
Collapse
|
39
|
Annotated Draft Genome Assemblies for the Northern Bobwhite ( Colinus virginianus) and the Scaled Quail ( Callipepla squamata) Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size. G3-GENES GENOMES GENETICS 2017; 7:3047-3058. [PMID: 28717047 PMCID: PMC5592930 DOI: 10.1534/g3.117.043083] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Northern bobwhite (Colinus virginianus; hereafter bobwhite) and scaled quail (Callipepla squamata) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA.
Collapse
|
40
|
Evaluation of the impact of Illumina error correction tools on de novo genome assembly. BMC Bioinformatics 2017; 18:374. [PMID: 28821237 PMCID: PMC5563063 DOI: 10.1186/s12859-017-1784-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 08/11/2017] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods. RESULTS For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy. CONCLUSIONS We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools.
Collapse
|
41
|
FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads. Sci Rep 2017; 7:2537. [PMID: 28566690 PMCID: PMC5451431 DOI: 10.1038/s41598-017-02487-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 04/12/2017] [Indexed: 11/21/2022] Open
Abstract
We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina “Platinum” genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).
Collapse
|
42
|
Xu C, Jiao C, Sun H, Cai X, Wang X, Ge C, Zheng Y, Liu W, Sun X, Xu Y, Deng J, Zhang Z, Huang S, Dai S, Mou B, Wang Q, Fei Z, Wang Q. Draft genome of spinach and transcriptome diversity of 120 Spinacia accessions. Nat Commun 2017; 8:15275. [PMID: 28537264 PMCID: PMC5458060 DOI: 10.1038/ncomms15275] [Citation(s) in RCA: 110] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Accepted: 03/07/2017] [Indexed: 01/21/2023] Open
Abstract
Spinach is an important leafy vegetable enriched with multiple necessary nutrients. Here we report the draft genome sequence of spinach (Spinacia oleracea, 2n=12), which contains 25,495 protein-coding genes. The spinach genome is highly repetitive with 74.4% of its content in the form of transposable elements. No recent whole genome duplication events are observed in spinach. Genome syntenic analysis between spinach and sugar beet suggests substantial inter- and intra-chromosome rearrangements during the Caryophyllales genome evolution. Transcriptome sequencing of 120 cultivated and wild spinach accessions reveals more than 420 K variants. Our data suggests that S. turkestanica is likely the direct progenitor of cultivated spinach and spinach domestication has a weak bottleneck. We identify 93 domestication sweeps in the spinach genome, some of which are associated with important agronomic traits including bolting, flowering and leaf numbers. This study offers insights into spinach evolution and domestication and provides resources for spinach research and improvement. Spinach is an economically important vegetable crop but previous genomic resources were of limited use for comparative and functional analyses. Here, Xu et al. present a high quality draft spinach genome and transcriptome data for multiple Spinacia accessions providing insight into Caryophyllales genome evolution.
Collapse
Affiliation(s)
- Chenxi Xu
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Chen Jiao
- Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA
| | - Honghe Sun
- Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA
| | - Xiaofeng Cai
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Xiaoli Wang
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Chenhui Ge
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Yi Zheng
- Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA
| | - Wenli Liu
- Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA
| | - Xuepeng Sun
- Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA
| | - Yimin Xu
- Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA
| | - Jie Deng
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Zhonghua Zhang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Sanwen Huang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shaojun Dai
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Beiquan Mou
- USDA-Agricultural Research Service, Crop Improvement and Protection Research Unit, Salinas, California 93905, USA
| | - Quanxi Wang
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Zhangjun Fei
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China.,Boyce Thompson Institute, Cornell University, Ithaca, New York 14853, USA.,USDA-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Quanhua Wang
- Development and Collaborative Innovation Center of Plant Germplasm Resources, College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, China
| |
Collapse
|
43
|
Hosner PA, Tobias JA, Braun EL, Kimball RT. How do seemingly non-vagile clades accomplish trans-marine dispersal? Trait and dispersal evolution in the landfowl (Aves: Galliformes). Proc Biol Sci 2017; 284:20170210. [PMID: 28469029 PMCID: PMC5443944 DOI: 10.1098/rspb.2017.0210] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/03/2017] [Indexed: 11/12/2022] Open
Abstract
Dispersal ability is a key factor in determining insular distributions and island community composition, yet non-vagile terrestrial organisms widely occur on oceanic islands. The landfowl (pheasants, partridges, grouse, turkeys, quails and relatives) are generally poor dispersers, but the Old World quail (Coturnix) are a notable exception. These birds evolved small body sizes and high-aspect-ratio wing shapes, and hence are capable of trans-continental migrations and trans-oceanic colonization. Two monotypic partridge genera, Margaroperdix of Madagascar and Anurophasis of alpine New Guinea, may represent additional examples of trans-marine dispersal in landfowl, but their body size and wing shape are typical of poorly dispersive continental species. Here, we estimate historical relationships of quail and their relatives using phylogenomics, and infer body size and wing shape evolution in relation to trans-marine dispersal events. Our results show that Margaroperdix and Anurophasis are nested within the Coturnix quail, and are each 'island giants' that independently evolved from dispersive, Coturnix-like ancestral populations that colonized and were subsequently isolated on Madagascar and New Guinea. This evolutionary cycle of gain and loss of dispersal ability, coupled with extinction of dispersive taxa, can result in the false appearance that non-vagile taxa somehow underwent rare oceanic dispersal.
Collapse
Affiliation(s)
- Peter A Hosner
- Department of Biology, University of Florida, Gainesville, FL, USA
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | - Joseph A Tobias
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL, USA
| | | |
Collapse
|
44
|
De Novo Genome and Transcriptome Assembly of the Canadian Beaver ( Castor canadensis). G3-GENES GENOMES GENETICS 2017; 7:755-773. [PMID: 28087693 PMCID: PMC5295618 DOI: 10.1534/g3.116.038208] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.
Collapse
|
45
|
Guthrie JL, Gardy JL. A brief primer on genomic epidemiology: lessons learned from Mycobacterium tuberculosis. Ann N Y Acad Sci 2016; 1388:59-77. [PMID: 28009051 DOI: 10.1111/nyas.13273] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 09/02/2016] [Accepted: 09/13/2016] [Indexed: 12/13/2022]
Abstract
Genomics is now firmly established as a technique for the investigation and reconstruction of communicable disease outbreaks, with many genomic epidemiology studies focusing on revealing transmission routes of Mycobacterium tuberculosis. In this primer, we introduce the basic techniques underlying transmission inference from genomic data, using illustrative examples from M. tuberculosis and other pathogens routinely sequenced by public health agencies. We describe the laboratory and epidemiological scenarios under which genomics may or may not be used, provide an introduction to sequencing technologies and bioinformatics approaches to identifying transmission-informative variation and resistance-associated mutations, and discuss how variation must be considered in the light of available clinical and epidemiological information to infer transmission.
Collapse
Affiliation(s)
- Jennifer L Guthrie
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| | - Jennifer L Gardy
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada.,Communicable Disease Prevention and Control Services, British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| |
Collapse
|
46
|
Abstract
Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome "obesity" in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1 We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species' range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.
Collapse
|
47
|
Taylor JF, Whitacre LK, Hoff JL, Tizioto PC, Kim J, Decker JE, Schnabel RD. Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals. Genet Sel Evol 2016; 48:59. [PMID: 27534529 PMCID: PMC4989351 DOI: 10.1186/s12711-016-0237-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 08/02/2016] [Indexed: 12/31/2022] Open
Abstract
Background Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. Methods We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. Results We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual’s genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Conclusions Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0237-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA.
| | - Lynsey K Whitacre
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA.,Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Jesse L Hoff
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Polyana C Tizioto
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA.,Embrapa Southeast Livestock, São Carlos, SP, Brazil
| | - JaeWoo Kim
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Jared E Decker
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA.,Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA.,Informatics Institute, University of Missouri, Columbia, MO, USA
| |
Collapse
|
48
|
Butts CT, Bierma JC, Martin RW. Novel proteases from the genome of the carnivorous plant Drosera capensis: Structural prediction and comparative analysis. Proteins 2016; 84:1517-33. [PMID: 27353064 DOI: 10.1002/prot.25095] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Revised: 05/16/2016] [Accepted: 06/13/2016] [Indexed: 12/21/2022]
Abstract
In his 1875 monograph on insectivorous plants, Darwin described the feeding reactions of Drosera flypaper traps and predicted that their secretions contained a "ferment" similar to mammalian pepsin, an aspartic protease. Here we report a high-quality draft genome sequence for the cape sundew, Drosera capensis, the first genome of a carnivorous plant from order Caryophyllales, which also includes the Venus flytrap (Dionaea) and the tropical pitcher plants (Nepenthes). This species was selected in part for its hardiness and ease of cultivation, making it an excellent model organism for further investigations of plant carnivory. Analysis of predicted protein sequences yields genes encoding proteases homologous to those found in other plants, some of which display sequence and structural features that suggest novel functionalities. Because the sequence similarity to proteins of known structure is in most cases too low for traditional homology modeling, 3D structures of representative proteases are predicted using comparative modeling with all-atom refinement. Although the overall folds and active residues for these proteins are conserved, we find structural and sequence differences consistent with a diversity of substrate recognition patterns. Finally, we predict differences in substrate specificities using in silico experiments, providing targets for structure/function studies of novel enzymes with biological and technological significance. Proteins 2016; 84:1517-1533. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Carter T Butts
- Department of Electrical Engineering and Computer Science, UC Irvine, Irvine, California, 92697. .,Department of Statistics, UC Irvine, Irvine, California, 92697. .,Department of Sociology, UC Irvine, Irvine, California, 92697.
| | - Jan C Bierma
- Department of Molecular Biology and Biochemistry, UC Irvine, Irvine, California, 92697
| | - Rachel W Martin
- Department of Molecular Biology and Biochemistry, UC Irvine, Irvine, California, 92697. .,Department of Chemistry, UC Irvine, Irvine, California, 92697.
| |
Collapse
|
49
|
Preston JL, Royall AE, Randel MA, Sikkink KL, Phillips PC, Johnson EA. High-specificity detection of rare alleles with Paired-End Low Error Sequencing (PELE-Seq). BMC Genomics 2016; 17:464. [PMID: 27301885 PMCID: PMC4908710 DOI: 10.1186/s12864-016-2669-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 04/25/2016] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Polymorphic loci exist throughout the genomes of a population and provide the raw genetic material needed for a species to adapt to changes in the environment. The minor allele frequencies of rare Single Nucleotide Polymorphisms (SNPs) within a population have been difficult to track with Next-Generation Sequencing (NGS), due to the high error rate of standard methods such as Illumina sequencing. RESULTS We have developed a wet-lab protocol and variant-calling method that identifies both sequencing and PCR errors, called Paired-End Low Error Sequencing (PELE-Seq). To test the specificity and sensitivity of the PELE-Seq method, we sequenced control E. coli DNA libraries containing known rare alleles present at frequencies ranging from 0.2-0.4 % of the total reads. PELE-Seq had higher specificity and sensitivity than standard libraries. We then used PELE-Seq to characterize rare alleles in a Caenorhabditis remanei nematode worm population before and after laboratory adaptation, and found that minor and rare alleles can undergo large changes in frequency during lab-adaptation. CONCLUSION We have developed a method of rare allele detection that mitigates both sequencing and PCR errors, called PELE-Seq. PELE-Seq was evaluated using control E. coli populations and was then used to compare a wild C. remanei population to a lab-adapted population. The PELE-Seq method is ideal for investigating the dynamics of rare alleles in a broad range of reduced-representation sequencing methods, including targeted amplicon sequencing, RAD-Seq, ddRAD, and GBS. PELE-Seq is also well-suited for whole genome sequencing of mitochondria and viruses, and for high-throughput rare mutation screens.
Collapse
Affiliation(s)
- Jessica L Preston
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA.
| | - Ariel E Royall
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
| | - Melissa A Randel
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
| | - Kristin L Sikkink
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, USA
| | - Patrick C Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, USA
| | - Eric A Johnson
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
| |
Collapse
|
50
|
Heo Y, Ramachandran A, Hwu WM, Ma J, Chen D. BLESS 2: accurate, memory-efficient and fast error correction method. ACTA ACUST UNITED AC 2016; 32:2369-71. [PMID: 27153708 DOI: 10.1093/bioinformatics/btw146] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Accepted: 03/12/2016] [Indexed: 11/14/2022]
Abstract
UNLABELLED The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. AVAILABILITY AND IMPLEMENTATION Freely available at https://sourceforge.net/projects/bless-ec CONTACT dchen@illinois.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Heo
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Anand Ramachandran
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Wen-Mei Hwu
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jian Ma
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Deming Chen
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|