1
|
Bossert S, Pauly A, Danforth BN, Orr MC, Murray EA. Lessons from assembling UCEs: A comparison of common methods and the case of Clavinomia (Halictidae). Mol Ecol Resour 2024; 24:e13925. [PMID: 38183389 DOI: 10.1111/1755-0998.13925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 12/08/2023] [Accepted: 12/21/2023] [Indexed: 01/08/2024]
Abstract
Sequence data assembly is a foundational step in high-throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under-studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE-only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided-assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally move Clavinomia to Dieunomiini and render Epinomia once more a subgenus of Dieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better-performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Washington State University, Pullman, Washington, USA
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Alain Pauly
- Royal Belgian Institute of Natural Sciences, O.D. Taxonomy and Phylogeny, Brussels, Belgium
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Ithaca, New York, USA
| | - Michael C Orr
- Entomologie, Staatliches Museum für Naturkunde Stuttgart, Stuttgart, Germany
| | - Elizabeth A Murray
- Department of Entomology, Washington State University, Pullman, Washington, USA
| |
Collapse
|
2
|
Jackman SD, Coombe L, Warren RL, Kirk H, Trinh E, MacLeod T, Pleasance S, Pandoh P, Zhao Y, Coope RJ, Bousquet J, Bohlmann J, Jones SJM, Birol I. Complete Mitochondrial Genome of a Gymnosperm, Sitka Spruce (Picea sitchensis), Indicates a Complex Physical Structure. Genome Biol Evol 2021; 12:1174-1179. [PMID: 32449750 PMCID: PMC7486957 DOI: 10.1093/gbe/evaa108] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2020] [Indexed: 12/12/2022] Open
Abstract
Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.
Collapse
Affiliation(s)
- Shaun D Jackman
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Lauren Coombe
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - René L Warren
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Heather Kirk
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Eva Trinh
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Tina MacLeod
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Stephen Pleasance
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Pawan Pandoh
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Yongjun Zhao
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Robin J Coope
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Jean Bousquet
- Forest Genomics, Institute for Systems and Integrative Biology, Université Laval, Quebec, Quebec, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Steven J M Jones
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Inanc Birol
- Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| |
Collapse
|
3
|
Jackman SD, Warren RL, Gibb EA, Vandervalk BP, Mohamadi H, Chu J, Raymond A, Pleasance S, Coope R, Wildung MR, Ritland CE, Bousquet J, Jones SJM, Bohlmann J, Birol I. Organellar Genomes of White Spruce (Picea glauca): Assembly and Annotation. Genome Biol Evol 2015; 8:29-41. [PMID: 26645680 PMCID: PMC4758241 DOI: 10.1093/gbe/evv244] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) were assembled from whole-genome shotgun sequencing data using ABySS. The sequencing data contained reads from both the nuclear and organellar genomes, and reads of the organellar genomes were abundant in the data as each cell harbors hundreds of mitochondria and plastids. Hence, assembly of the 123-kb plastid and 5.9-Mb mitochondrial genomes were accomplished by analyzing data sets primarily representing low coverage of the nuclear genome. The assembled organellar genomes were annotated for their coding genes, ribosomal RNA, and transfer RNA. Transcript abundances of the mitochondrial genes were quantified in three developmental tissues and five mature tissues using data from RNA-seq experiments. C-to-U RNA editing was observed in the majority of mitochondrial genes, and in four genes, editing events were noted to modify ACG codons to create cryptic AUG start codons. The informatics methodology presented in this study should prove useful to assemble organellar genomes of other plant species using whole-genome shotgun sequencing data.
Collapse
Affiliation(s)
- Shaun D Jackman
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Ewan A Gibb
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Benjamin P Vandervalk
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Hamid Mohamadi
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Justin Chu
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Anthony Raymond
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Stephen Pleasance
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Robin Coope
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Mark R Wildung
- School of Molecular Biosciences, Washington State University
| | - Carol E Ritland
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Jean Bousquet
- Department of Forest and Environmental Genomics, Université Laval, Québec, QC, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Joerg Bohlmann
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, Canada Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada Department of Botany, University of British Columbia, Vancouver, BC, Canada
| | - Inanç Birol
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada School of Computing Science, Simon Fraser University, Burnaby, BC, Canada Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
4
|
Warren RL, Keeling CI, Yuen MMS, Raymond A, Taylor GA, Vandervalk BP, Mohamadi H, Paulino D, Chiu R, Jackman SD, Robertson G, Yang C, Boyle B, Hoffmann M, Weigel D, Nelson DR, Ritland C, Isabel N, Jaquish B, Yanchuk A, Bousquet J, Jones SJM, MacKay J, Birol I, Bohlmann J. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. Plant J 2015; 83:189-212. [PMID: 26017574 DOI: 10.1111/tpj.12886] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 05/15/2015] [Indexed: 05/21/2023]
Abstract
White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.
Collapse
Affiliation(s)
- René L Warren
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Christopher I Keeling
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Macaire Man Saint Yuen
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Anthony Raymond
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Greg A Taylor
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Benjamin P Vandervalk
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Hamid Mohamadi
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Daniel Paulino
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Readman Chiu
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Shaun D Jackman
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Gordon Robertson
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Chen Yang
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Brian Boyle
- Department of Wood and Forest Sciences, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Margarete Hoffmann
- Max Planck Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany
| | - Detlef Weigel
- Max Planck Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany
| | - David R Nelson
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Carol Ritland
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Nathalie Isabel
- Natural Resources Canada, Laurentian Forestry Centre, Québec, QC, G1V 4C7, Canada
| | - Barry Jaquish
- British Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, BC, V8W 9C2, Canada
| | - Alvin Yanchuk
- British Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, BC, V8W 9C2, Canada
| | - Jean Bousquet
- Department of Wood and Forest Sciences, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - John MacKay
- Department of Wood and Forest Sciences, Université Laval, Québec, QC, G1V 0A6, Canada
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Inanc Birol
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|