1
|
Manuel JG, Heins HB, Crocker S, Neidich JA, Sadzewicz L, Tallon L, Turner TN. High Coverage Highly Accurate Long-Read Sequencing of a Mouse Neuronal Cell Line Using the PacBio Revio Sequencer. bioRxiv 2023:2023.06.06.543940. [PMID: 37333171 PMCID: PMC10274723 DOI: 10.1101/2023.06.06.543940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Recently, Pacific Biosciences released a new highly accurate long-read sequencer called the Revio System that is projected to generate 30× HiFi whole-genome sequencing for the human genome within one sequencing SMRT Cell. Mouse and human genomes are similar in size. In this study, we sought to test this new sequencer by characterizing the genome and epigenome of the mouse neuronal cell line Neuro-2a. We generated long-read HiFi whole-genome sequencing on three Revio SMRT Cells, achieving a total coverage of 98×, with 30×, 32×, and 36× coverage respectively for each of the three Revio SMRT Cells. We performed several tests on these data including single-nucleotide variant and small insertion detection using GPU-accelerated DeepVariant, structural variant detection with pbsv, methylation detection with pb-CpG-tools, and generating de novo assemblies with the HiCanu and hifiasm assemblers. Overall, we find consistency across SMRT Cells in coverage, detection of variation, methylation, and de novo assemblies for each of the three SMRT Cells.
Collapse
Affiliation(s)
- Juana G. Manuel
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hillary B. Heins
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Sandra Crocker
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Julie A. Neidich
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Luke Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
2
|
Black AN, Bondo KJ, Mularo A, Hernandez A, Yu Y, Stein CM, Gregory A, Fricke KA, Prendergast J, Sullins D, Haukos D, Whitson M, Grisham B, Lowe Z, DeWoody JA. A highly-contiguous and annotated genome assembly of the Lesser Prairie-Chicken (Tympanuchus pallidicinctus). Genome Biol Evol 2023; 15:7077021. [PMID: 36916502 PMCID: PMC10118296 DOI: 10.1093/gbe/evad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/01/2023] [Accepted: 03/04/2023] [Indexed: 03/14/2023] Open
Abstract
The Lesser Prairie-Chicken (Tympanuchus pallidicinctus; LEPC) is an iconic North American prairie grouse, renowned for ornate and spectacular breeding season displays. Unfortunately, the species has disappeared across much of its historical range, with corresponding precipitous declines in contemporary population abundance, largely due to climactic and anthropogenic factors. These declines led to a 2022U.S. Fish and Wildlife decision to identify and list two Distinct Population Segments (i.e., Northern and Southern DPSs) as threatened or endangered under the 1973 Endangered Species Act. Herein, we describe an annotated reference genome that was generated from a LEPC sample collected from Southern DPS. We chose a representative from the Southern DPS because of the potential for introgression in the Northern DPS, where some populations hybridize with the Greater Prairie-Chicken (Tympanuchus cupido). This new LEPC reference assembly consists of 206 scaffolds, a N50 of 45 Mb, and 15,563 predicted protein-coding genes. We demonstrate the utility of this new genome assembly by estimating genome-wide heterozygosity in a representative LEPC and in related species. Heterozygosity in a LEPC sample was 0.0024, near the middle of the range (0.0003-0.0050) of related species. Overall, this new assembly provides a valuable resource that will enhance evolutionary and conservation genetic research in prairie grouse.
Collapse
Affiliation(s)
- Andrew N Black
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, Indiana, USA
| | - Kristin J Bondo
- Department of Natural Resources and Management, Texas Tech University, Lubbock, Texas, USA
| | - Andrew Mularo
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Alvaro Hernandez
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana Illinois, USA
| | - Yachi Yu
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana Illinois, USA
| | - Carleigh M Stein
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
| | - Andy Gregory
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
| | - Kent A Fricke
- Kansas Department of Wildlife and Parks, Emporia, Kansas, USA
| | | | - Dan Sullins
- Horticulture and Natural Resources, Kansas State University, Manhattan, Kansas, USA
| | - David Haukos
- U.S. Geological Survey, Kansas Cooperative Fish and Wildlife Research Unit, Kansas State University, Manhattan, Kansas, USA
| | - Michael Whitson
- Department of Natural Resources and Management, Texas Tech University, Lubbock, Texas, USA
| | - Blake Grisham
- Department of Natural Resources and Management, Texas Tech University, Lubbock, Texas, USA
| | - Zach Lowe
- Western Association of Fish and Wildlife Agencies, Boise, Idaho, USA
| | - J Andrew DeWoody
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, Indiana, USA.,Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
3
|
London EW, Roca AL, Novakofski JE, Mateus-Pinilla NE. A De Novo Chromosome-level Genome Assembly of the White-tailed Deer, Odocoileus virginianus. J Hered 2022; 113:479-489. [PMID: 35511871 PMCID: PMC9308042 DOI: 10.1093/jhered/esac022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 05/05/2022] [Indexed: 11/12/2022] Open
Abstract
Cervids are distinguished by the shedding and regrowth of antlers. Furthermore, they provide insights into prion and other diseases. Genomic resources can facilitate studies of the genetic underpinnings of deer phenotypes, behavior, and disease resistance. Widely distributed in North America, the white-tailed deer (Odocoileus virginianus) has recreational, commercial, and food source value for many households. We present a genome generated using DNA from a single Illinois white-tailed sequenced on the PacBio Sequel II platform and assembled using Wtdbg2. Omni-C chromatin conformation capture sequencing was used to scaffold the genome contigs. The final assembly was 2.42 Gb, consisting of 508 scaffolds with a contig N50 of 21.7 Mb, a scaffold N50 of 52.4 Mb, and a BUSCO complete score of 93.1%. Thirty-six chromosome pseudomolecules comprised 93% of the entire sequenced genome length. A total of 20 651 predicted genes using the BRAKER pipeline were validated using InterProScan. Chromosome length assembly sequences were aligned to the genomes of related species to reveal corresponding chromosomes.
Collapse
Affiliation(s)
- Evan W London
- Illinois Natural History Survey-Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA.,Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Alfred L Roca
- Illinois Natural History Survey-Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA.,Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jan E Novakofski
- Illinois Natural History Survey-Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA.,Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Nohra E Mateus-Pinilla
- Illinois Natural History Survey-Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA.,Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
4
|
Seol D, Lim JS, Sung S, Lee YH, Jeong M, Cho S, Kwak W, Kim H. Microbial Identification Using rRNA Operon Region: Database and Tool for Metataxonomics with Long-Read Sequence. Microbiol Spectr 2022; 10:e0201721. [PMID: 35352997 PMCID: PMC9045266 DOI: 10.1128/spectrum.02017-21] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 03/02/2022] [Indexed: 12/24/2022] Open
Abstract
Recent development of long-read sequencing platforms has enabled researchers to explore bacterial community structure through analysis of full-length 16S rRNA gene (∼1,500 bp) or 16S-ITS-23S rRNA operon region (∼4,300 bp), resulting in higher taxonomic resolution than short-read sequencing platforms. Despite the potential of long-read sequencing in metagenomics, resources and protocols for this technology are scarce. Here, we describe MIrROR, the database and analysis tool for metataxonomics using the bacterial 16S-ITS-23S rRNA operon region. We collected 16S-ITS-23S rRNA operon sequences extracted from bacterial genomes from NCBI GenBank and performed curation. A total of 97,781 16S-ITS-23S rRNA operon sequences covering 9,485 species from 43,653 genomes were obtained. For user convenience, we provide an analysis tool based on a mapping strategy that can be used for taxonomic profiling with MIrROR database. To benchmark MIrROR, we compared performance against publicly available databases and tool with mock communities and simulated data sets. Our platform showed promising results in terms of the number of species covered and the accuracy of classification. To encourage active 16S-ITS-23S rRNA operon analysis in the field, BLAST function and taxonomic profiling results with 16S-ITS-23S rRNA operon studies, which have been reported as BioProject on NCBI are provided. MIrROR (http://mirror.egnome.co.kr/) will be a useful platform for researchers who want to perform high-resolution metagenome analysis with a cost-effective sequencer such as MinION from Oxford Nanopore Technologies. IMPORTANCE Metabarcoding is a powerful tool to investigate community diversity in an economic and efficient way by amplifying a specific gene marker region. With the advancement of long-read sequencing technologies, the field of metabarcoding has entered a new phase. The technologies have brought a need for development in several areas, including new markers that long-read can cover, database for the markers, tools that reflect long-read characteristics, and compatibility with downstream analysis tools. By constructing MIrROR, we met the need for a database and tools for the 16S-ITS-23S rRNA operon region, which has recently been shown to have sufficient resolution at the species level. Bacterial community analysis using the 16S-ITS-23S rRNA operon region with MIrROR will provide new insights from various research fields.
Collapse
Affiliation(s)
- Donghyeok Seol
- eGnome, Inc, Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Jin Soo Lim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | | | - Young Ho Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | | | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Woori Kwak
- eGnome, Inc, Seoul, Republic of Korea
- Hoonygen, Seoul, Republic of Korea
- Gencube Plus, Seoul, Republic of Korea
| | - Heebal Kim
- eGnome, Inc, Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
5
|
Harder AM, Walden KKO, Marra NJ, Willoughby JR. High-quality reference genome for an arid-adapted mammal, the banner-tailed kangaroo rat (Dipodomys spectabilis). Genome Biol Evol 2022; 14:6506520. [PMID: 35026029 PMCID: PMC8800484 DOI: 10.1093/gbe/evac005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2022] [Indexed: 11/29/2022] Open
Abstract
Kangaroo rats in the genus Dipodomys are found in a variety of habitat types in western North America, including deserts, arid and semiarid grasslands, and scrublands. Many Dipodomys species are experiencing strong population declines due to increasing habitat fragmentation, with two species listed as federally endangered in the United States. The precarious state of many Dipodomys populations, including those occupying extreme environments, make species of this genus valuable subjects for studying the impacts of habitat degradation and fragmentation on population genomic patterns and for characterizing the genomic bases of adaptation to harsh conditions. To facilitate exploration of such questions, we assembled and annotated a reference genome for the banner-tailed kangaroo rat (Dipodomys spectabilis) using PacBio HiFi sequencing reads, providing a more contiguous genomic resource than two previously assembled Dipodomys genomes. Using the HiFi data for D. spectabilis and publicly available sequencing data for two other Dipodomys species (Dipodomys ordii and Dipodomys stephensi), we demonstrate the utility of this new assembly for studies of congeners by conducting inference of historic effective population sizes (Ne) and linking these patterns to the species’ current extinction risk statuses. The genome assembly presented here will serve as a valuable resource for population and conservation genomic studies of Dipodomys species, comparative genomic research within mammals and rodents, and investigations into genomic adaptation to extreme environments and changing landscapes.
Collapse
Affiliation(s)
- Avril M Harder
- School of Forestry and Wildlife Sciences, Auburn University, Auburn, Alabama, USA
| | - Kimberly K O Walden
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Nicholas J Marra
- Division of Science, Mathematics, and Technology, Governors State University,University Park, Illinois, USA
| | - Janna R Willoughby
- School of Forestry and Wildlife Sciences, Auburn University, Auburn, Alabama, USA
| |
Collapse
|
6
|
Mueller RC, Ellström P, Howe K, Uliano-Silva M, Kuo RI, Miedzinska K, Warr A, Fedrigo O, Haase B, Mountcastle J, Chow W, Torrance J, Wood JMD, Järhult JD, Naguib MM, Olsen B, Jarvis ED, Smith J, Eöry L, Kraus RHS. A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck). Gigascience 2021; 10:giab081. [PMID: 34927191 PMCID: PMC8685854 DOI: 10.1093/gigascience/giab081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 07/15/2021] [Accepted: 11/22/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. FINDINGS This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. CONCLUSIONS This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long-read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses.
Collapse
Affiliation(s)
- Ralf C Mueller
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, 78315, Germany
- Department of Biology, University of Konstanz, Konstanz, 78457, Germany
| | - Patrik Ellström
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, SE-75185, Sweden
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | - Richard I Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Katarzyna Miedzinska
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Amanda Warr
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, 10065, NY
| | - Bettina Haase
- Vertebrate Genome Laboratory, The Rockefeller University, New York, 10065, NY
| | | | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | - Josef D Järhult
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, SE-75185, Sweden
| | - Mahmoud M Naguib
- Department of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala University, Uppsala, 75237, Sweden
| | - Björn Olsen
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, SE-75185, Sweden
| | - Erich D Jarvis
- Vertebrate Genome Laboratory and HHMI, The Rockefeller University, New York, 10065, NY
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Lél Eöry
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Robert H S Kraus
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, 78315, Germany
- Department of Biology, University of Konstanz, Konstanz, 78457, Germany
| |
Collapse
|
7
|
Wang L, Zhu T, Rodriguez JC, Deal KR, Dubcovsky J, McGuire PE, Lux T, Spannagl M, Mayer KFX, Baldrich P, Meyers BC, Huo N, Gu YQ, Zhou H, Devos KM, Bennetzen JL, Unver T, Budak H, Gulick PJ, Galiba G, Kalapos B, Nelson DR, Li P, You FM, Luo MC, Dvorak J. Aegilops tauschii genome assembly Aet v5.0 features greater sequence contiguity and improved annotation. G3 (Bethesda) 2021; 11:6369516. [PMID: 34515796 PMCID: PMC8664484 DOI: 10.1093/g3journal/jkab325] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 08/31/2021] [Indexed: 01/01/2023]
Abstract
Aegilops tauschii is the donor of the D subgenome of hexaploid wheat and an important genetic resource. The reference-quality genome sequence Aet v4.0 for Ae. tauschii acc. AL8/78 was therefore an important milestone for wheat biology and breeding. Further advances in sequencing acc. AL8/78 and release of the Aet v5.0 sequence assembly are reported here. Two new optical maps were constructed and used in the revision of pseudomolecules. Gaps were closed with Pacific Biosciences long-read contigs, decreasing the gap number by 38,899. Transposable elements and protein-coding genes were reannotated. The number of annotated high-confidence genes was reduced from 39,635 in Aet v4.0 to 32,885 in Aet v5.0. A total of 2245 biologically important genes, including those affecting plant phenology, grain quality, and tolerance of abiotic stresses in wheat, was manually annotated and disease-resistance genes were annotated by a dedicated pipeline. Disease-resistance genes encoding nucleotide-binding site domains, receptor-like protein kinases, and receptor-like proteins were preferentially located in distal chromosome regions, whereas those encoding transmembrane coiled-coil proteins were dispersed more evenly along the chromosomes. Discovery, annotation, and expression analyses of microRNA (miRNA) precursors, mature miRNAs, and phasiRNAs are reported, including miRNA target genes. Other small RNAs, such as hc-siRNAs and tRFs, were characterized. These advances enhance the utility of the Ae. tauschii genome sequence for wheat genetics, biotechnology, and breeding.
Collapse
Affiliation(s)
- Le Wang
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Tingting Zhu
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Juan C Rodriguez
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Karin R Deal
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Jorge Dubcovsky
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Patrick E McGuire
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Thomas Lux
- Plant Genome and Systems Biology, Helmholtz Zentrum München, Munich 85764, Germany
| | - Manuel Spannagl
- Plant Genome and Systems Biology, Helmholtz Zentrum München, Munich 85764, Germany
| | - Klaus F X Mayer
- Plant Genome and Systems Biology, Helmholtz Zentrum München, Munich 85764, Germany
| | - Patricia Baldrich
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Blake C Meyers
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA.,University of Missouri, Columbia, Division of Plant Sciences, Columbia, Missouri 65211, USA
| | - Naxin Huo
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, California 94710, USA
| | - Yong Q Gu
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, California 94710, USA
| | - Hongye Zhou
- Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, USA
| | - Katrien M Devos
- Institute of Plant Breeding, Genetics and Genomics (Dept. of Crop & Soil Sciences) and Dept. of Plant Biology, University of Georgia, Athens, Georgia 30602, USA
| | | | - Turgay Unver
- Ficus Biotechnology, Ostim Teknopark, Ankara 06374, Turkey
| | - Hikmet Budak
- Montana BioAg Inc., Missoula, Montana 59801, USA
| | - Patrick J Gulick
- Department of Biology, Concordia University, Montreal, Quebec H3G 1M8, Canada
| | - Gabor Galiba
- Department of Biological Resources, Centre for Agricultural Research, Eötvös Loránd Research Network, H-2462 Martonvásár, Hungary.,Department of Environmental Sustainability, IES, Hungarian University of Agriculture and Life Sciences, H-8360 Keszthely, Hungary
| | - Balázs Kalapos
- Department of Biological Resources, Centre for Agricultural Research, Eötvös Loránd Research Network, H-2462 Martonvásár, Hungary
| | - David R Nelson
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Pingchuan Li
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario K1A 0C5, Canada
| | - Frank M You
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario K1A 0C5, Canada
| | - Ming-Cheng Luo
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| | - Jan Dvorak
- Department of Plant Sciences, University of California, Davis, Davis, California 95616, USA
| |
Collapse
|
8
|
Hotaling S, Sproul JS, Heckenhauer J, Powell A, Larracuente AM, Pauls SU, Kelley JL, Frandsen PB. Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing. Genome Biol Evol 2021; 13:evab138. [PMID: 34152413 PMCID: PMC8358217 DOI: 10.1093/gbe/evab138] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/10/2021] [Indexed: 12/15/2022] Open
Abstract
The first insect genome assembly (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a "state-of-the-field" perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are ∼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations.
Collapse
Affiliation(s)
- Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, Washington, USA
| | - John S Sproul
- Department of Biology, University of Rochester, New York, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
| | | | - Steffen U Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
- Institute for Insect Biotechnology, Justus-Liebig-University, Giessen, Germany
| | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, Washington, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG), Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
- Data Science Lab, Smithsonian Institution, Washington, District of Columbia, USA
| |
Collapse
|
9
|
Jae Lee S, Kim JH, Jo E, Choi E, Kim J, Choi SG, Chung S, Kim HW, Park H. Chromosomal assembly of the Antarctic toothfish ( Dissostichus mawsoni) genome using third-generation DNA sequencing and Hi-C technology. Zool Res 2021; 42:124-129. [PMID: 33258338 PMCID: PMC7840457 DOI: 10.24272/j.issn.2095-8137.2020.264] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The Antarctic toothfish, Dissostichus mawsoni, belongs to the Nototheniidae family and is distributed in sub-zero temperatures below S60° latitude in the Southern Ocean. Therefore, it is an attractive model species to study the stenothermal cold-adapted character state. In this study, we successfully generated highly contiguous genome sequences of D. mawsoni, which contained 1 062 scaffolds with a N50 length of 36.98 Mb and longest scaffold length of 46.82 Mb. Repetitive elements accounted for 40.87% of the genome. We also inferred 32 914 protein-coding genes using in silico gene prediction and transcriptome sequencing and detected splicing variants using Isoform-Sequencing (Iso-Seq), which will be invaluable resource for further exploration of the adaptation mechanisms of Antarctic toothfish. This new high-quality reference genome of D. mawsoni provides a fundamental resource for a deeper understanding of cold adaptation and conservation of species.
Collapse
Affiliation(s)
- Seung Jae Lee
- College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Republic of Korea
| | - Jeong-Hoon Kim
- Division of Polar Life Science, Korea Polar Research Institute, Incheon 21990, Republic of Korea
| | - Euna Jo
- College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Republic of Korea.,Unit of Research for Practical Application, Korea Polar Research Institute, Incheon 21990, Republic of Korea
| | - Eunkyung Choi
- College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Republic of Korea
| | - Jinmu Kim
- College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Republic of Korea
| | - Seok-Gwan Choi
- National Institute of Fisheries Science (NIFS), Busan 46083, Republic of Korea
| | - Sangdeok Chung
- National Institute of Fisheries Science (NIFS), Busan 46083, Republic of Korea
| | - Hyun-Woo Kim
- Department of Marine Biology, Pukyong National University, Busan 48513, Republic of Korea. E-mail:
| | - Hyun Park
- College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Republic of Korea. E-mail:
| |
Collapse
|
10
|
Abstract
From a genomics perspective, bivalves (Mollusca: Bivalvia) have been poorly explored with the exception for those of high economic value. The bivalve order Unionida, or freshwater mussels, has been of interest in recent genomic studies due to their unique mitochondrial biology and peculiar life cycle. However, genomic studies have been hindered by the lack of a high-quality reference genome. Here, I present a genome assembly of Potamilus streckersoni using Pacific Bioscience single-molecule real-time long reads and 10X Genomics-linked read sequencing. Further, I use RNA sequencing from multiple tissue types and life stages to annotate the reference genome. The final assembly was far superior to any previously published freshwater mussel genome and was represented by 2,368 scaffolds (2,472 contigs) and 1,776,755,624 bp, with a scaffold N50 of 2,051,244 bp. A high proportion of the assembly was comprised of repetitive elements (51.03%), aligning with genomic characteristics of other bivalves. The functional annotation returned 52,407 gene models (41,065 protein, 11,342 tRNAs), which was concordant with the estimated number of genes in other freshwater mussel species. This genetic resource, along with future studies developing high-quality genome assemblies and annotations, will be integral toward unraveling the genomic bases of ecologically and evolutionarily important traits in this hyper-diverse group.
Collapse
Affiliation(s)
- Chase H Smith
- Department of Integrative Biology, University of Texas, Austin, Texas, USA
- Biology Department, Baylor University, Waco, Texas, USA
| |
Collapse
|
11
|
Torma G, Tombácz D, Csabai Z, Göbhardter D, Deim Z, Snyder M, Boldogkői Z. An Integrated Sequencing Approach for Updating the Pseudorabies Virus Transcriptome. Pathogens 2021; 10:pathogens10020242. [PMID: 33672563 PMCID: PMC7924054 DOI: 10.3390/pathogens10020242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 02/17/2021] [Accepted: 02/18/2021] [Indexed: 01/06/2023] Open
Abstract
In the last couple of years, the implementation of long-read sequencing (LRS) technologies for transcriptome profiling has uncovered an extreme complexity of viral gene expression. In this study, we carried out a systematic analysis on the pseudorabies virus transcriptome by combining our current data obtained by using Pacific Biosciences Sequel and Oxford Nanopore Technologies MinION sequencing with our earlier data generated by other LRS and short-read sequencing techniques. As a result, we identified a number of novel genes, transcripts, and transcript isoforms, including splice and length variants, and also confirmed earlier annotated RNA molecules. One of the major findings of this study is the discovery of a large number of 5′-truncations of larger putative mRNAs being 3′-co-terminal with canonical mRNAs of PRV. A large fraction of these putative RNAs contain in-frame ATGs, which might initiate translation of N-terminally truncated polypeptides. Our analyses indicate that CTO-S, a replication origin-associated RNA molecule is expressed at an extremely high level. This study demonstrates that the PRV transcriptome is much more complex than previously appreciated.
Collapse
Affiliation(s)
- Gábor Torma
- Department of Medical Biology, Faculty of Medicine, University of Szeged, 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (D.G.)
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (D.G.)
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94304, USA;
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (D.G.)
| | - Dániel Göbhardter
- Department of Medical Biology, Faculty of Medicine, University of Szeged, 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (D.G.)
| | - Zoltán Deim
- Department of Biotechnology, Faculty of Science and Informatics, University of Szeged, 6726 Szeged, Hungary;
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94304, USA;
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (D.G.)
- Correspondence:
| |
Collapse
|
12
|
Murigneux V, Rai SK, Furtado A, Bruxner TJC, Tian W, Harliwong I, Wei H, Yang B, Ye Q, Anderson E, Mao Q, Drmanac R, Wang O, Peters BA, Xu M, Wu P, Topp B, Coin LJM, Henry RJ. Comparison of long-read methods for sequencing and assembly of a plant genome. Gigascience 2020; 9:giaa146. [PMID: 33347571 PMCID: PMC7751402 DOI: 10.1093/gigascience/giaa146] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/07/2020] [Accepted: 11/22/2020] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. RESULTS Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. CONCLUSIONS The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.
Collapse
Affiliation(s)
- Valentine Murigneux
- Genome Innovation Hub, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
| | - Subash Kumar Rai
- Genome Innovation Hub, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Timothy J C Bruxner
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
| | - Wei Tian
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Ivon Harliwong
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Hanmin Wei
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Bicheng Yang
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Qianyu Ye
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Ellis Anderson
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Qing Mao
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Radoje Drmanac
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Ou Wang
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
| | - Brock A Peters
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Mengyang Xu
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Qingdao, Building 2, No. 2 Hengyunshan Road, Qingdao 266555, China
| | - Pei Wu
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Tianjin, Airport Business Park, Building E3, Airport Economics Area, Tianjin 300308, China
| | - Bruce Topp
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Lachlan J M Coin
- Genome Innovation Hub, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Department of Microbiology and Immunology, University of Melbourne at The Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, VIC 3004, Australia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
13
|
Murigneux V, Rai SK, Furtado A, Bruxner TJC, Tian W, Harliwong I, Wei H, Yang B, Ye Q, Anderson E, Mao Q, Drmanac R, Wang O, Peters BA, Xu M, Wu P, Topp B, Coin LJM, Henry RJ. Comparison of long-read methods for sequencing and assembly of a plant genome. Gigascience 2020; 9:6042729. [PMID: 33347571 DOI: 10.1101/2020.03.16.992933] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/07/2020] [Accepted: 11/22/2020] [Indexed: 05/23/2023] Open
Abstract
BACKGROUND Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. RESULTS Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. CONCLUSIONS The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.
Collapse
Affiliation(s)
- Valentine Murigneux
- Genome Innovation Hub, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
| | - Subash Kumar Rai
- Genome Innovation Hub, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Timothy J C Bruxner
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
| | - Wei Tian
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Ivon Harliwong
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Hanmin Wei
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Bicheng Yang
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Qianyu Ye
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Australia, 300 Herston Road, Herston, QLD 4006, Australia
| | - Ellis Anderson
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Qing Mao
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Radoje Drmanac
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Ou Wang
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
| | - Brock A Peters
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
- Advanced Genomics Technology Lab, Complete Genomics Inc., 2904 Orchard Parkway, San Jose, CA 95134, USA
| | - Mengyang Xu
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Qingdao, Building 2, No. 2 Hengyunshan Road, Qingdao 266555, China
| | - Pei Wu
- BGI-Shenzhen, No.21 Hongan 3rd Street, Yantian District, Shenzhen 518083, China
- BGI-Tianjin, Airport Business Park, Building E3, Airport Economics Area, Tianjin 300308, China
| | - Bruce Topp
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Lachlan J M Coin
- Genome Innovation Hub, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, Brisbane, QLD 4072, Australia
- Department of Microbiology and Immunology, University of Melbourne at The Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street, Melbourne, VIC 3004, Australia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
14
|
Jung H, Jeon MS, Hodgett M, Waterhouse P, Eyun SI. Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops. J Agric Food Chem 2020; 68:7670-7677. [PMID: 32530283 DOI: 10.1021/acs.jafc.0c01647] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The availability of recent state-of-the-art long-read sequencing technologies has significantly increased the ease and speed of producing high-quality plant genome assemblies. A wide variety of genome-related software tools are now available and they are typically benchmarked using microbial or model eukaryotic genomes such as Arabidopsis and rice. However, many plant species have much larger and more complex genomes than these, and the choice of tools, parameters, and/or strategies that can be used is not always obvious. Thus, we have compared the metrics of assemblies generated by various pipelines to discuss how assembly quality can be affected by two different assembly strategies. First, we focused on optimizing read preprocessing and assembler variables using eight different de novo assemblers on five different Pacific Biosciences long-read datasets of diploid and tetraploid species. Then, we examined a single scaffolding tool (quickmerge) that has been employed for the postprocessing step. We then merged the outputs from multiple assemblies to produce a higher quality consensus assembly. Then, we benchmarked the assemblies for completeness and accuracy (assembly metrics and BUSCO), computer memory, and CPU times. Two lightweight assemblers, Miniasm/Minimap/Racon and WTDBG, were deemed good for novice users because they involved smaller required learning curves and light computational resources. However, two heavyweight tools, CANU and Flye, should be the first choice when the goal is to achieve accurate and complete assemblies. Our results will provide valuable guidance in future plant genome projects and beyond.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Centre for Agriculture and Biocommodities, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul 06974, Korea
| | - Matthew Hodgett
- Information Technology Services, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Peter Waterhouse
- Centre for Agriculture and Biocommodities, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Seong-Il Eyun
- Department of Life Science, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
15
|
Abstract
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
| |
Collapse
|
16
|
Abstract
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of seven long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.7 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 and NECAT v20200119 were the most likely to produce clean contig circularisation. Raven v0.0.8 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.4.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
| |
Collapse
|
17
|
Abstract
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.0 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200119 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.0/v1.2.4 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.1.10 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.5.1 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
| |
Collapse
|
18
|
Abstract
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
| |
Collapse
|
19
|
Abstract
Long-read sequencing holds great potential for transcriptome analysis because it offers researchers an affordable method to annotate the transcriptomes of non-model organisms. This, in turn, will greatly benefit future work on less-researched organisms like unicellular eukaryotes that cannot rely on large consortia to generate these transcriptome annotations. However, to realize this potential, several remaining molecular and computational challenges will have to be overcome. In this review, we have outlined the limitations of short-read sequencing technology and how long-read sequencing technology overcomes these limitations. We have also highlighted the unique challenges still present for long-read sequencing technology and provided some suggestions on how to overcome these challenges going forward. This article is part of a discussion meeting issue 'Single cell ecology'.
Collapse
Affiliation(s)
- Ashley Byrne
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Charles Cole
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Roger Volden
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
20
|
Guo Y, Zhang Y, Liu Q, Huang Y, Mao G, Yue Z, Abe EM, Li J, Wu Z, Li S, Zhou X, Hu W, Xiao N. A chromosomal-level genome assembly for the giant African snail Achatina fulica. Gigascience 2019; 8:giz124. [PMID: 31634388 PMCID: PMC6802634 DOI: 10.1093/gigascience/giz124] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 05/09/2019] [Accepted: 09/27/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Achatina fulica, the giant African snail, is the largest terrestrial mollusk species. Owing to its voracious appetite, wide environmental adaptability, high growth rate, and reproductive capacity, it has become an invasive species across the world, mainly in Southeast Asia, Japan, the western Pacific islands, and China. This pest can damage agricultural crops and is an intermediate host of many parasites that can threaten human health. However, genomic information of A. fulica remains limited, hindering genetic and genomic studies for invasion control and management of the species. FINDINGS Using a k-mer-based method, we estimated the A. fulica genome size to be 2.12 Gb, with a high repeat content up to 71%. Roughly 101.6 Gb genomic long-read data of A. fulica were generated from the Pacific Biosciences sequencing platform and assembled to produce a first A. fulica genome of 1.85 Gb with a contig N50 length of 726 kb. Using contact information from the Hi-C sequencing data, we successfully anchored 99.32% contig sequences into 31 chromosomes, leading to the final contig and scaffold N50 length of 721 kb and 59.6 Mb, respectively. The continuity, completeness, and accuracy were evaluated by genome comparison with other mollusk genomes, BUSCO assessment, and genomic read mapping. A total of 23,726 protein-coding genes were predicted from the assembled genome, among which 96.34% of the genes were functionally annotated. The phylogenetic analysis using whole-genome protein-coding genes revealed that A. fulica separated from a common ancestor with Biomphalaria glabrata ∼182 million years ago. CONCLUSION To our knowledge, the A. fulica genome is the first terrestrial mollusk genome published to date. The chromosome sequence of A. fulica will provide the research community with a valuable resource for population genetics and environmental adaptation studies for the species, as well as investigations of the chromosome-level of evolution within mollusks.
Collapse
Affiliation(s)
- Yunhai Guo
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Yi Zhang
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Qin Liu
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Yun Huang
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Guangyao Mao
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Zhiyuan Yue
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Eniola M Abe
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Jian Li
- State Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Science, Fudan University, Shanghai 200438, China
| | - Zhongdao Wu
- Department of Parasitology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Shizhu Li
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Xiaonong Zhou
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| | - Wei Hu
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
- State Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Science, Fudan University, Shanghai 200438, China
| | - Ning Xiao
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Key Laboratory of Parasite and Vector Biology, Ministry of Health; WHO Collaborating Centre for Tropical Diseases; Chinese Centre for Tropical Diseases Research, Shanghai 200025, P. R. China
| |
Collapse
|
21
|
Tombácz D, Moldován N, Balázs Z, Gulyás G, Csabai Z, Boldogkői M, Snyder M, Boldogkői Z. Multiple Long-Read Sequencing Survey of Herpes Simplex Virus Dynamic Transcriptome. Front Genet 2019; 10:834. [PMID: 31608102 PMCID: PMC6769088 DOI: 10.3389/fgene.2019.00834] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 08/13/2019] [Indexed: 12/12/2022] Open
Abstract
Long-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying polycistronic RNAs, transcript isoforms including splice and transcript end variants, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 2,300 previously undetected transcripts, including coding, and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we found previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5' regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.
Collapse
Affiliation(s)
- Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Norbert Moldován
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Zsolt Balázs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Gábor Gulyás
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Miklós Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, United States
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| |
Collapse
|
22
|
Moldován N, Szucs A, Tombácz D, Balázs Z, Csabai Z, Snyder M, Boldogkoi Z. Multiplatform next-generation sequencing identifies novel RNA molecules and transcript isoforms of the endogenous retrovirus isolated from cultured cells. FEMS Microbiol Lett 2019; 365:4816730. [PMID: 29361122 DOI: 10.1093/femsle/fny013] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 01/17/2018] [Indexed: 12/17/2022] Open
Abstract
In this study, we applied short- and long-read RNA sequencing techniques, as well as PCR analysis to investigate the transcriptome of the porcine endogenous retrovirus (PERV) expressed from cultured porcine kidney cell line PK-15. This analysis has revealed six novel transcripts and eight transcript isoforms, including five length and three splice variants. We were able to establish whether a deletion in a transcript is the result of the splicing of mRNAs or of genomic deletion in one of the PERV clones. Additionally, we re-annotated the formerly identified RNA molecules. Our analysis revealed a higher complexity of PERV transcriptome than it was earlier believed.
Collapse
Affiliation(s)
- Norbert Moldován
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged H-6720, Hungary
| | - Attila Szucs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged H-6720, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged H-6720, Hungary.,Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Zsolt Balázs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged H-6720, Hungary
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged H-6720, Hungary
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Zsolt Boldogkoi
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged H-6720, Hungary
| |
Collapse
|
23
|
Somerville V, Lutz S, Schmid M, Frei D, Moser A, Irmler S, Frey JE, Ahrens CH. Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system. BMC Microbiol 2019; 19:143. [PMID: 31238873 PMCID: PMC6593500 DOI: 10.1186/s12866-019-1500-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 05/31/2019] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Complete and contiguous genome assemblies greatly improve the quality of subsequent systems-wide functional profiling studies and the ability to gain novel biological insights. While a de novo genome assembly of an isolated bacterial strain is in most cases straightforward, more informative data about co-existing bacteria as well as synergistic and antagonistic effects can be obtained from a direct analysis of microbial communities. However, the complexity of metagenomic samples represents a major challenge. While third generation sequencing technologies have been suggested to enable finished metagenome-assembled genomes, to our knowledge, the complete genome assembly of all dominant strains in a microbiome sample has not been demonstrated. Natural whey starter cultures (NWCs) are used in cheese production and represent low-complexity microbiomes. Previous studies of Swiss Gruyère and selected Italian hard cheeses, mostly based on amplicon metagenomics, concurred that three species generally pre-dominate: Streptococcus thermophilus, Lactobacillus helveticus and Lactobacillus delbrueckii. RESULTS Two NWCs from Swiss Gruyère producers were subjected to whole metagenome shotgun sequencing using the Pacific Biosciences Sequel and Illumina MiSeq platforms. In addition, longer Oxford Nanopore Technologies MinION reads had to be generated for one to resolve repeat regions. Thereby, we achieved the complete assembly of all dominant bacterial genomes from these low-complexity NWCs, which was corroborated by a 16S rRNA amplicon survey. Moreover, two distinct L. helveticus strains were successfully co-assembled from the same sample. Besides bacterial chromosomes, we could also assemble several bacterial plasmids and phages and a corresponding prophage. Biologically relevant insights were uncovered by linking the plasmids and phages to their respective host genomes using DNA methylation motifs on the plasmids and by matching prokaryotic CRISPR spacers with the corresponding protospacers on the phages. These results could only be achieved by employing long-read sequencing data able to span intragenomic as well as intergenomic repeats. CONCLUSIONS Here, we demonstrate the feasibility of complete de novo genome assembly of all dominant strains from low-complexity NWCs based on whole metagenomics shotgun sequencing data. This allowed to gain novel biological insights and is a fundamental basis for subsequent systems-wide omics analyses, functional profiling and phenotype to genotype analysis of specific microbial communities.
Collapse
Affiliation(s)
- Vincent Somerville
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics, Schloss 1, CH-8820 Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Stefanie Lutz
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics, Schloss 1, CH-8820 Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics, Schloss 1, CH-8820 Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Daniel Frei
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics, Schloss 1, CH-8820 Wädenswil, Switzerland
| | - Aline Moser
- Agroscope, Research Group Biochemistry of Milk and Microorganisms, CH-3003 Bern, Switzerland
| | - Stefan Irmler
- Agroscope, Research Group Biochemistry of Milk and Microorganisms, CH-3003 Bern, Switzerland
| | - Jürg E. Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics, Schloss 1, CH-8820 Wädenswil, Switzerland
| | - Christian H. Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics, Schloss 1, CH-8820 Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
24
|
Paajanen P, Kettleborough G, López-Girona E, Giolai M, Heavens D, Baker D, Lister A, Cugliandolo F, Wilde G, Hein I, Macaulay I, Bryan GJ, Clark MD. A critical comparison of technologies for a plant genome sequencing project. Gigascience 2019; 8:giy163. [PMID: 30624602 PMCID: PMC6423373 DOI: 10.1093/gigascience/giy163] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Revised: 09/26/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates. RESULTS Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs. CONCLUSIONS The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.
Collapse
Affiliation(s)
- Pirita Paajanen
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - George Kettleborough
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Elena López-Girona
- Cell and Molcular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
- The New Zealand Institute for Plant & Food Research Limited, Palmerston North 4442, New Zealand
| | - Michael Giolai
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Darren Heavens
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - David Baker
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Ashleigh Lister
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Fiorella Cugliandolo
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Gail Wilde
- Cell and Molcular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Ingo Hein
- Cell and Molcular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Iain Macaulay
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Glenn J Bryan
- Cell and Molcular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Matthew D Clark
- Technology Development, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
- Department of Life Sciences, Natural History Museum, Cromwell Road, London WC2 5BD, UK
| |
Collapse
|
25
|
Abstract
Strategies for sequencing fungal genomes on next-generation sequencing (NGS) platforms depend on the characteristics of the genome of the targeted species, quantity and quality of the genomic DNA, and cost considerations. Massively parallel sequencing with sequencing by synthesis (SBS) approach by Illumina produces terabases of short read sequences (i.e., ~300 bp) in a time and cost-effective manner, though the read length can limit the assembly particularly in repetitive regions. The single molecule, real-time (SMRT) sequencing approach by Pacific Biosciences (PacBio) produces longer reads (i.e., ~12,500 bp) which can facilitate de novo assembly of genomes that contain long repetitive sequences, though due to the lower-throughput of this platform achieving the coverage needed for assembly is more expensive than by SBS. Additionally, the Illumina SBS platforms can handle low quantity/quality of genomic DNA materials, while the SMRT system requires undamaged long DNA fragments as input to ensure that high-quality data is produced. Both platforms are discussed in this chapter including key decision-making points.
Collapse
Affiliation(s)
- Yuko Yoshinaga
- United States Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Christopher Daum
- United States Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Guifen He
- United States Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Ronan O'Malley
- United States Department of Energy Joint Genome Institute, Walnut Creek, CA, USA.
| |
Collapse
|
26
|
Mostafa AA, Kostur C, Stamm L, Khan F, Berka N. Characterization of a novel allele, HLA-C*02:135N, by full-length gene sequencing in a bone marrow donor. HLA 2018; 91:538-539. [PMID: 29575749 DOI: 10.1111/tan.13260] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 03/19/2018] [Accepted: 03/20/2018] [Indexed: 11/28/2022]
Abstract
A frameshift because of a two-nucleotide deletion results in an HLA-C null allele, HLA-C*02:135N.
Collapse
Affiliation(s)
- A A Mostafa
- Histocompatibility and Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Calgary Laboratory Services, University of Calgary, Calgary, Canada
| | - C Kostur
- Histocompatibility and Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Calgary Laboratory Services, University of Calgary, Calgary, Canada
| | - L Stamm
- Histocompatibility and Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Calgary Laboratory Services, University of Calgary, Calgary, Canada
| | - F Khan
- Histocompatibility and Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Calgary Laboratory Services, University of Calgary, Calgary, Canada
| | - N Berka
- Histocompatibility and Immunogenetics Laboratory, Department of Pathology and Laboratory Medicine, Calgary Laboratory Services, University of Calgary, Calgary, Canada
| |
Collapse
|
27
|
Myer PR, Kim M, Freetly HC, Smith TPL. Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers. Data Brief 2016; 8:1048-53. [PMID: 27508263 PMCID: PMC4969246 DOI: 10.1016/j.dib.2016.07.027] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 07/05/2016] [Accepted: 07/13/2016] [Indexed: 12/01/2022] Open
Abstract
Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided. For more information, see “Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers” P.R. Myer, M. Kim, H.C. Freetly, T.P.L. Smith (2016) [1].
Collapse
Affiliation(s)
- Phillip R Myer
- Department of Animal Science, University of Tennessee Institute of Agriculture, University of Tennessee, Knoxville, TN 37996, USA
| | - MinSeok Kim
- USDA-ARS, U.S. Meat Animal Research Center, Clay Center NE 68933 , USA
| | - Harvey C Freetly
- USDA-ARS, U.S. Meat Animal Research Center, Clay Center NE 68933 , USA
| | - Timothy P L Smith
- USDA-ARS, U.S. Meat Animal Research Center, Clay Center NE 68933 , USA
| |
Collapse
|
28
|
Vázquez-Nion D, Rodríguez-Castro J, López-Rodríguez MC, Fernández-Silva I, Prieto B. Subaerial biofilms on granitic historic buildings: microbial diversity and development of phototrophic multi-species cultures. Biofouling 2016; 32:657-669. [PMID: 27192622 DOI: 10.1080/08927014.2016.1183121] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 04/20/2016] [Indexed: 06/05/2023]
Abstract
Microbial communities of natural subaerial biofilms developed on granitic historic buildings of a World Heritage Site (Santiago de Compostela, NW Spain) were characterized and cultured in liquid BG11 medium. Environmental barcoding through next-generation sequencing (Pacific Biosciences) revealed that the biofilms were mainly composed of species of Chlorophyta (green algae) and Ascomycota (fungi) commonly associated with rock substrata. Richness and diversity were higher for the fungal than for the algal assemblages and fungi showed higher heterogeneity among samples. Cultures derived from natural biofilms showed the establishment of stable microbial communities mainly composed of Chlorophyta and Cyanobacteria. Although most taxa found in these cultures were not common in the original biofilms, they are likely common pioneer colonizers of building stone surfaces, including granite. Stable phototrophic multi-species cultures of known microbial diversity were thus obtained and their reliability to emulate natural colonization on granite should be confirmed in further experiments.
Collapse
Affiliation(s)
- D Vázquez-Nion
- a Facultade de Farmacia, Departamento de Edafoloxía e Química Agrícola , Universidade de Santiago de Compostela , Santiago de Compostela , Spain
| | - J Rodríguez-Castro
- b Departamento de Bioquímica e Bioloxía Molecular , Centro de Investigacións Biolóxicas (CIBUS), Universidade de Santiago de Compostela , Santiago de Compostela , Spain
| | - M C López-Rodríguez
- c Facultade de Bioloxía, Departamento de Botánica , Universidade de Santiago de Compostela , Santiago de Compostela , Spain
| | - I Fernández-Silva
- d Section of Ichthyology , California Academy of Sciences , San Francisco , CA , USA
| | - B Prieto
- a Facultade de Farmacia, Departamento de Edafoloxía e Química Agrícola , Universidade de Santiago de Compostela , Santiago de Compostela , Spain
| |
Collapse
|
29
|
Mahesh HB, Shirke MD, Singh S, Rajamani A, Hittalmani S, Wang GL, Gowda M. Indica rice genome assembly, annotation and mining of blast disease resistance genes. BMC Genomics 2016; 17:242. [PMID: 26984283 PMCID: PMC4793524 DOI: 10.1186/s12864-016-2523-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 02/24/2016] [Indexed: 12/19/2022] Open
Abstract
Background Rice is a major staple food crop in the world. Over 80 % of rice cultivation area is under indica rice. Currently, genomic resources are lacking for indica as compared to japonica rice. In this study, we generated deep-sequencing data (Illumina and Pacific Biosciences sequencing) for one of the indica rice cultivars, HR-12 from India. Results We assembled over 86 % (389 Mb) of rice genome and annotated 56,284 protein-coding genes from HR-12 genome using Illumina and PacBio sequencing. Comprehensive comparative analyses between indica and japonica subspecies genomes revealed a large number of indica specific variants including SSRs, SNPs and InDels. To mine disease resistance genes, we sequenced few indica rice cultivars that are reported to be highly resistant (Tetep and Tadukan) and susceptible (HR-12 and Co-39) against blast fungal isolates in many countries including India. Whole genome sequencing of rice genotypes revealed high rate of mutations in defense related genes (NB-ARC, LRR and PK domains) in resistant cultivars as compared to susceptible. This study has identified R-genes Pi-ta and Pi54 from durable indica resistant cultivars; Tetep and Tadukan, which can be used in marker assisted selection in rice breeding program. Conclusions This is the first report of whole genome sequencing approach to characterize Indian rice germplasm. The genomic resources from our work will have a greater impact in understanding global rice diversity, genetics and molecular breeding. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2523-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- H B Mahesh
- Genomics Laboratory, Centre for Cellular and Molecular Platforms (C-CAMP), National Centre for Biological Sciences (NCBS), Bengaluru, 560065, India.,Marker Assisted Selection Laboratory, Department of Genetics and Plant Breeding, University of Agricultural Sciences, Bengaluru, 560065, India.,Department of Plant Pathology, College of Food, Agricultural and Environmental Sciences, Ohio State University, Columbus, 43210, USA
| | - Meghana Deepak Shirke
- Genomics Laboratory, Centre for Cellular and Molecular Platforms (C-CAMP), National Centre for Biological Sciences (NCBS), Bengaluru, 560065, India.,Manipal University, Manipal, 576104, India
| | - Siddarth Singh
- Pacific Biosciences, Boon Lay Way, Singapore, 609964, Singapore
| | - Anantharamanan Rajamani
- Genomics Laboratory, Centre for Cellular and Molecular Platforms (C-CAMP), National Centre for Biological Sciences (NCBS), Bengaluru, 560065, India
| | - Shailaja Hittalmani
- Marker Assisted Selection Laboratory, Department of Genetics and Plant Breeding, University of Agricultural Sciences, Bengaluru, 560065, India
| | - Guo-Liang Wang
- Department of Plant Pathology, College of Food, Agricultural and Environmental Sciences, Ohio State University, Columbus, 43210, USA
| | - Malali Gowda
- Genomics Laboratory, Centre for Cellular and Molecular Platforms (C-CAMP), National Centre for Biological Sciences (NCBS), Bengaluru, 560065, India. .,Genomics Discovery Program, School of Conservation, Life Science and Health Sciences, TransDisciplinary University, Foundation of Revitalization of Local Health Traditions, Bengaluru, 560064, India.
| |
Collapse
|
30
|
Frank J, Dingemanse C, Schmitz AM, Vossen RHAM, van Ommen GJB, den Dunnen JT, Robanus-Maandag EC, Anvar SY. The Complete Genome Sequence of the Murine Pathobiont Helicobacter typhlonius. Front Microbiol 2016; 6:1549. [PMID: 26779178 PMCID: PMC4705304 DOI: 10.3389/fmicb.2015.01549] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 12/21/2015] [Indexed: 01/27/2023] Open
Abstract
Background: Immuno-compromised mice infected with Helicobacter typhlonius are used to model microbially inducted inflammatory bowel disease (IBD). The specific mechanism through which H. typhlonius induces and promotes IBD is not fully understood. Access to the genome sequence is essential to examine emergent properties of this organism, such as its pathogenicity. To this end, we present the complete genome sequence of H. typhlonius MIT 97-6810, obtained through single-molecule real-time sequencing. Results: The genome was assembled into a single circularized contig measuring 1.92 Mbp with an average GC content of 38.8%. In total 2,117 protein-encoding genes and 43 RNA genes were identified. Numerous pathogenic features were found, including a putative pathogenicity island (PAIs) containing components of type IV secretion system, virulence-associated proteins and cag PAI protein. We compared the genome of H. typhlonius to those of the murine pathobiont H. hepaticus and human pathobiont H. pylori. H. typhlonius resembles H. hepaticus most with 1,594 (75.3%) of its genes being orthologous to genes in H. hepaticus. Determination of the global methylation state revealed eight distinct recognition motifs for adenine and cytosine methylation. H. typhlonius shares four of its recognition motifs with H. pylori. Conclusion: The complete genome sequence of H. typhlonius MIT 97-6810 enabled us to identify many pathogenic features suggesting that H. typhlonius can act as a pathogen. Follow-up studies are necessary to evaluate the true nature of its pathogenic capabilities. We found many methylated sites and a plethora of restriction-modification systems. The genome, together with the methylome, will provide an essential resource for future studies investigating gene regulation, host interaction and pathogenicity of H. typhlonius. In turn, this work can contribute to unraveling the role of Helicobacter in enteric disease.
Collapse
Affiliation(s)
- Jeroen Frank
- Leiden Genome Technology Center, Leiden University Medical Center Leiden, Netherlands
| | - Celia Dingemanse
- Department of Human Genetics, Leiden University Medical Center Leiden, Netherlands
| | - Arnoud M Schmitz
- Leiden Genome Technology Center, Leiden University Medical Center Leiden, Netherlands
| | - Rolf H A M Vossen
- Leiden Genome Technology Center, Leiden University Medical Center Leiden, Netherlands
| | - Gert-Jan B van Ommen
- Department of Human Genetics, Leiden University Medical Center Leiden, Netherlands
| | - Johan T den Dunnen
- Leiden Genome Technology Center, Leiden University Medical CenterLeiden, Netherlands; Department of Human Genetics, Leiden University Medical CenterLeiden, Netherlands; Department of Clinical Genetics, Leiden University Medical CenterLeiden, Netherlands
| | | | - Seyed Yahya Anvar
- Leiden Genome Technology Center, Leiden University Medical CenterLeiden, Netherlands; Department of Human Genetics, Leiden University Medical CenterLeiden, Netherlands
| |
Collapse
|
31
|
Wöhrmann T, Huettel B, Wagner N, Weising K. Microsatellites from Fosterella christophii (Bromeliaceae) by de novo transcriptome sequencing on the Pacific Biosciences RS platform. Appl Plant Sci 2016; 4:apps1500084. [PMID: 26819858 PMCID: PMC4716777 DOI: 10.3732/apps.1500084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 09/24/2015] [Indexed: 06/05/2023]
Abstract
PREMISE OF THE STUDY Microsatellite markers were developed in Fosterella christophii (Bromeliaceae) to investigate the genetic diversity and population structure within the F. micrantha group, comprising F. christophii, F. micrantha, and F. villosula. METHODS AND RESULTS Full-length cDNAs were isolated from F. christophii and sequenced on a Pacific Biosciences RS platform. A total of 1590 high-quality consensus isoforms were assembled into 971 unigenes containing 421 perfect microsatellites. Thirty primer sets were designed, of which 13 revealed a high level of polymorphism in three populations of F. christophii, with four to nine alleles per locus. Each of these 13 loci cross-amplified in the closely related species F. micrantha and F. villosula, with one to six and one to 11 alleles per locus, respectively. CONCLUSIONS The new markers are promising tools to study the population genetics of F. christophii and to discover species boundaries within the F. micrantha group.
Collapse
Affiliation(s)
- Tina Wöhrmann
- Systematics and Morphology of Plants, Institute of Biology, University of Kassel, Heinrich-Plett-Str. 40, 34132 Kassel, Germany
| | - Bruno Huettel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Cologne, Germany
| | - Natascha Wagner
- Systematics and Morphology of Plants, Institute of Biology, University of Kassel, Heinrich-Plett-Str. 40, 34132 Kassel, Germany
| | - Kurt Weising
- Systematics and Morphology of Plants, Institute of Biology, University of Kassel, Heinrich-Plett-Str. 40, 34132 Kassel, Germany
| |
Collapse
|
32
|
Qiao W, Yang Y, Sebra R, Mendiratta G, Gaedigk A, Desnick RJ, Scott SA. Long-Read Single Molecule Real-Time Full Gene Sequencing of Cytochrome P450-2D6. Hum Mutat 2015; 37:315-23. [PMID: 26602992 DOI: 10.1002/humu.22936] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 11/13/2015] [Indexed: 12/30/2022]
Abstract
The cytochrome P450-2D6 (CYP2D6) enzyme metabolizes ∼25% of common medications, yet homologous pseudogenes and copy number variants (CNVs) make interrogating the polymorphic CYP2D6 gene with short-read sequencing challenging. Therefore, we developed a novel long-read, full gene CYP2D6 single molecule real-time (SMRT) sequencing method using the Pacific Biosciences platform. Long-range PCR and CYP2D6 SMRT sequencing of 10 previously genotyped controls identified expected star (*) alleles, but also enabled suballele resolution, diplotype refinement, and discovery of novel alleles. Coupled with an optimized variant-calling pipeline, CYP2D6 SMRT sequencing was highly reproducible as triplicate intra- and inter-run nonreference genotype results were completely concordant. Importantly, targeted SMRT sequencing of upstream and downstream CYP2D6 gene copies characterized the duplicated allele in 15 control samples with CYP2D6 CNVs. The utility of CYP2D6 SMRT sequencing was further underscored by identifying the diplotypes of 14 samples with discordant or unclear CYP2D6 configurations from previous targeted genotyping, which again included suballele resolution, duplicated allele characterization, and discovery of a novel allele and tandem arrangement. Taken together, long-read CYP2D6 SMRT sequencing is an innovative, reproducible, and validated method for full-gene characterization, duplication allele-specific analysis, and novel allele discovery, which will likely improve CYP2D6 metabolizer phenotype prediction for both research and clinical testing applications.
Collapse
Affiliation(s)
- Wanqiong Qiao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| | - Yao Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| | - Geetu Mendiratta
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| | - Andrea Gaedigk
- Division of Clinical Pharmacology, Toxicology & Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri, 64108.,School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, 64108
| | - Robert J Desnick
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| | - Stuart A Scott
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| |
Collapse
|