51
|
Maestri S, Gambino G, Lopatriello G, Minio A, Perrone I, Cosentino E, Giovannone B, Marcolungo L, Alfano M, Rombauts S, Cantu D, Rossato M, Delledonne M, Calderón L. 'Nebbiolo' genome assembly allows surveying the occurrence and functional implications of genomic structural variations in grapevines (Vitis vinifera L.). BMC Genomics 2022; 23:159. [PMID: 35209840 PMCID: PMC8867635 DOI: 10.1186/s12864-022-08389-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 02/15/2022] [Indexed: 12/18/2022] Open
Abstract
Background ‘Nebbiolo’ is a grapevine cultivar typical of north-western Italy, appreciated for producing high-quality red wines. Grapevine cultivars are characterized by possessing highly heterozygous genomes, including a great incidence of genomic rearrangements larger than 50 bp, so called structural variations (SVs). Even though abundant, SVs are an under-explored source of genetic variation mainly due to methodological limitations at their detection. Results We employed a multiple platform approach to produce long-range genomic data for two different ‘Nebbiolo’ clones, namely: optical mapping, long-reads and linked-reads. We performed a haplotype-resolved de novo assembly for cultivar ‘Nebbiolo’ (clone CVT 71) and used an ab-initio strategy to annotate it. The annotated assembly enhanced our ability to detect SVs, enabling the study of genomic regions not present in the grapevines’ reference genome and accounting for their functional implications. We performed variant calling analyses at three different organizational levels: i) between haplotypes of clone CVT 71 (primary assembly vs haplotigs), ii) between ‘Nebbiolo’ and ‘Cabernet Sauvignon’ assemblies and iii) between clones CVT 71 and CVT 185, representing different ‘Nebbiolo’ biotypes. The cumulative size of non-redundant merged SVs indicated a total of 79.6 Mbp for the first comparison and 136.1 Mbp for the second one, while no SVs were detected for the third comparison. Interestingly, SVs differentiating cultivars and haplotypes affected similar numbers of coding genes. Conclusions Our results suggest that SVs accumulation rate and their functional implications in ‘Nebbiolo’ genome are highly-dependent on the organizational level under study. SVs are abundant when comparing ‘Nebbiolo’ to a different cultivar or the two haplotypes of the same individual, while they turned absent between the two analysed clones. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08389-9.
Collapse
Affiliation(s)
- Simone Maestri
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Giorgio Gambino
- Institute for Sustainable Plant Protection, National Research Council (IPSP-CNR), Strada delle Cacce 73, 10135, Torino, Italy
| | - Giulia Lopatriello
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Andrea Minio
- Department of Viticulture & Enology, University of California Davis, 595 Hilgard Lane, Davis, CA, 95616, USA
| | - Irene Perrone
- Institute for Sustainable Plant Protection, National Research Council (IPSP-CNR), Strada delle Cacce 73, 10135, Torino, Italy
| | - Emanuela Cosentino
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Barbara Giovannone
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Luca Marcolungo
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Massimiliano Alfano
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Stephane Rombauts
- Department of Bioinformatics and Systems Biology, Ghent University, Technologiepark 927, B-9052, Gent, Belgium.,VIB Center for Plant Systems Biology, 9052, Gent, Belgium
| | - Dario Cantu
- Department of Viticulture & Enology, University of California Davis, 595 Hilgard Lane, Davis, CA, 95616, USA
| | - Marzia Rossato
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Massimo Delledonne
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy.
| | - Luciano Calderón
- Instituto de Biología Agrícola de Mendoza (IBAM, CONICET-UNCuyo), Almirante Brown 500, M5528AHB. Chacras de Coria, Mendoza, Argentina.
| |
Collapse
|
52
|
Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat Commun 2022; 13:682. [PMID: 35115520 PMCID: PMC8813957 DOI: 10.1038/s41467-022-28362-0] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 01/19/2022] [Indexed: 12/21/2022] Open
Abstract
Structural variants (SVs) represent a major source of genetic diversity and are related to numerous agronomic traits and evolutionary events; however, their comprehensive identification and characterization in cucumber (Cucumis sativus L.) have been hindered by the lack of a high-quality pan-genome. Here, we report a graph-based cucumber pan-genome by analyzing twelve chromosome-scale genome assemblies. Genotyping of seven large chromosomal rearrangements based on the pan-genome provides useful information for use of wild accessions in breeding and genetic studies. A total of ~4.3 million genetic variants including 56,214 SVs are identified leveraging the chromosome-level assemblies. The pan-genome graph integrating both variant information and reference genome sequences aids the identification of SVs associated with agronomic traits, including warty fruits, flowering times and root growth, and enhances the understanding of cucumber trait evolution. The graph-based cucumber pan-genome and the identified genetic variants provide rich resources for future biological research and genomics-assisted breeding. Increasing studies have suggested that single reference genome is insufficient to capture all variations in the genome. Here, the authors report a graph-based cucumber pan-genome by analyzing 12 chromosome-scale assemblies and reveal variations associated with agronomic traits and domestication.
Collapse
|
53
|
Yang Y, Yoo JY, Baek SH, Song HY, Jo S, Jung SH, Choi JH. Chromosome-level genome assembly of the shuttles hoppfish, Periophthalmus modestus. Gigascience 2022; 11:giab089. [PMID: 35022698 PMCID: PMC8756193 DOI: 10.1093/gigascience/giab089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 11/08/2021] [Accepted: 12/05/2021] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The shuttles hoppfish (mudskipper), Periophthalmus modestus, is one of the mudskippers, which are the largest group of amphibious teleost fishes, which are uniquely adapted to live on mudflats. Because mudskippers can survive on land for extended periods by breathing through their skin and through the lining of the mouth and throat, they were evaluated as a model for the evolutionary sea-land transition of Devonian protoamphibians, ancestors of all present tetrapods. RESULTS A total of 39.6, 80.2, 52.9, and 33.3 Gb of Illumina, Pacific Biosciences, 10X linked, and Hi-C data, respectively, was assembled into 1,419 scaffolds with an N50 length of 33 Mb and BUSCO score of 96.6%. The assembly covered 117% of the estimated genome size (729 Mb) and included 23 pseudo-chromosomes anchored by a Hi-C contact map, which corresponded to the top 23 longest scaffolds above 20 Mb and close to the estimated one. Of the genome, 43.8% were various repetitive elements such as DNAs, tandem repeats, long interspersed nuclear elements, and simple repeats. Ab initio and homology-based gene prediction identified 30,505 genes, of which 94% had homology to the 14 Actinopterygii transcriptomes and 89% and 85% to Pfam familes and InterPro domains, respectively. Comparative genomics with 15 Actinopterygii species identified 59,448 gene families of which 12% were only in P. modestus. CONCLUSIONS We present the high quality of the first genome assembly and gene annotation of the shuttles hoppfish. It will provide a valuable resource for further studies on sea-land transition, bimodal respiration, nitrogen excretion, osmoregulation, thermoregulation, vision, and mechanoreception.
Collapse
Affiliation(s)
- Youngik Yang
- Department of Applied Research, National Marine Biodiversity Institute of Korea, Seocheon 33662, South Korea
| | - Ji Yong Yoo
- Marine Bio-Resources and Information Center, National Marine Biodiversity Institute of Korea, Seocheon 33662, South Korea
| | - Sang Ho Baek
- Marine Bio-Resources and Information Center, National Marine Biodiversity Institute of Korea, Seocheon 33662, South Korea
| | - Ha Yeun Song
- Division of Bioresources Bank, Honam National Institute of Biological Resources, Mokpo 58762, South Korea
| | - Seonmi Jo
- Department of Applied Research, National Marine Biodiversity Institute of Korea, Seocheon 33662, South Korea
| | - Seung-Hyun Jung
- Department of Applied Research, National Marine Biodiversity Institute of Korea, Seocheon 33662, South Korea
| | - Jeong-Hyeon Choi
- Department of Applied Research, National Marine Biodiversity Institute of Korea, Seocheon 33662, South Korea
| |
Collapse
|
54
|
Abstract
The recent emergence of "third-generation" sequencing platforms which address shortcomings of standard short reads has allowed the resolution of complex genomic regions during genome assembly. However, sequencing costs for third-generation platforms continue to be high. Novel approaches that leverage the low cost of short-read sequencing while capturing long-range information have been developed. In this chapter, we focus on one such approach, the 10x Genomics' Chromium system. We demonstrate the assembly of the B73 maize reference genome using the Supernova assembler. We also offer suggestions on how one might improve the resulting assembly through analysis of assembly metrics.
Collapse
Affiliation(s)
- Paul Visendi
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
55
|
Stahlke AR, Bitume EV, Özsoy ZA, Bean DW, Veillet A, Clark MI, Clark EI, Moran P, Hufbauer RA, Hohenlohe PA. Hybridization and range expansion in tamarisk beetles ( Diorhabda spp.) introduced to North America for classical biological control. Evol Appl 2022; 15:60-77. [PMID: 35126648 PMCID: PMC8792477 DOI: 10.1111/eva.13325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 11/05/2021] [Accepted: 11/08/2021] [Indexed: 01/31/2023] Open
Abstract
With the global rise of human-mediated translocations and invasions, it is critical to understand the genomic consequences of hybridization and mechanisms of range expansion. Conventional wisdom is that high genetic drift and loss of genetic diversity due to repeated founder effects will constrain introduced species. However, reduced genetic variation can be countered by behavioral aspects and admixture with other distinct populations. As planned invasions, classical biological control (biocontrol) agents present important opportunities to understand the mechanisms of establishment and spread in a novel environment. The ability of biocontrol agents to spread and adapt, and their effects on local ecosystems, depends on genomic variation and the consequences of admixture in novel environments. Here, we use a biocontrol system to examine the genome-wide outcomes of introduction, spread, and hybridization in four cryptic species of a biocontrol agent, the tamarisk beetle (Diorhabda carinata, D. carinulata, D. elongata, and D. sublineata), introduced from six localities across Eurasia to control the invasive shrub tamarisk (Tamarix spp.) in western North America. We assembled a de novo draft reference genome and applied RADseq to over 500 individuals across laboratory cultures, the native ranges, and the introduced range. Despite evidence of a substantial genetic bottleneck among D. carinulata in N. America, populations continue to establish and spread, possibly due to aggregation behavior. We found that D. carinata, D. elongata, and D. sublineata hybridize in the field to varying extents, with D. carinata × D. sublineata hybrids being the most abundant. Genetic diversity was greater at sites with hybrids, highlighting potential for increased ability to adapt and expand. Our results demonstrate the complex patterns of genomic variation that can result from introduction of multiple ecotypes or species for biocontrol, and the importance of understanding them to predict and manage the effects of biocontrol agents in novel ecosystems.
Collapse
Affiliation(s)
- Amanda R. Stahlke
- Initiative for Bioinformatics and Evolutionary StudiesDepartment of Biological SciencesUniversity of IdahoMoscowIdahoUSA
- U.S. Department of Agriculture, Agricultural Research Service (USDA‐ARS)Beltsville Agricultural Research Center, Bee Research LaboratoryBeltsvilleMarylandUSA
| | - Ellyn V. Bitume
- U.S. Department of Agriculture, Agricultural Research Service (USDA‐ARS), Invasive Species and Pollinator Health Research UnitAlbanyCaliforniaUSA
- U.S. Department of Agriculture, Forest Service (USDA‐FS), Pacific Southwest, Institute of Pacific Islands ForestryHiloHawaiiUSA
| | - Zeynep A. Özsoy
- Department of Biological SciencesColorado Mesa UniversityGrand JunctionColoradoUSA
| | - Dan W. Bean
- Colorado Department of AgriculturePalisadeColoradoUSA
| | - Anne Veillet
- Initiative for Bioinformatics and Evolutionary StudiesDepartment of Biological SciencesUniversity of IdahoMoscowIdahoUSA
| | - Meaghan I. Clark
- Department of Integrative BiologyMichigan State UniversityEast LansingMichiganUSA
| | - Eliza I. Clark
- Agricultural BiologyColorado State UniversityFort CollinsColoradoUSA
- Graduate Degree Program in EcologyColorado State UniversityFort CollinsColoradoUSA
| | - Patrick Moran
- U.S. Department of Agriculture, Agricultural Research Service (USDA‐ARS), Invasive Species and Pollinator Health Research UnitAlbanyCaliforniaUSA
| | - Ruth A. Hufbauer
- Agricultural BiologyColorado State UniversityFort CollinsColoradoUSA
- Graduate Degree Program in EcologyColorado State UniversityFort CollinsColoradoUSA
| | - Paul A. Hohenlohe
- Initiative for Bioinformatics and Evolutionary StudiesDepartment of Biological SciencesUniversity of IdahoMoscowIdahoUSA
| |
Collapse
|
56
|
Liu Z, Roesti M, Marques D, Hiltbrunner M, Saladin V, Peichel CL. Chromosomal fusions facilitate adaptation to divergent environments in threespine stickleback. Mol Biol Evol 2021; 39:6462204. [PMID: 34908155 PMCID: PMC8826639 DOI: 10.1093/molbev/msab358] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Chromosomal fusions are hypothesized to facilitate adaptation to divergent environments, both by bringing together previously unlinked adaptive alleles and by creating regions of low recombination that facilitate the linkage of adaptive alleles; but, there is little empirical evidence to support this hypothesis. Here, we address this knowledge gap by studying threespine stickleback (Gasterosteus aculeatus), in which ancestral marine fish have repeatedly adapted to freshwater across the northern hemisphere. By comparing the threespine and ninespine stickleback (Pungitius pungitius) genomes to a de novo assembly of the fourspine stickleback (Apeltes quadracus) and an outgroup species, we find two chromosomal fusion events involving the same chromosomes have occurred independently in the threespine and ninespine stickleback lineages. On the fused chromosomes in threespine stickleback, we find an enrichment of quantitative trait loci underlying traits that contribute to marine versus freshwater adaptation. By comparing whole-genome sequences of freshwater and marine threespine stickleback populations, we also find an enrichment of regions under divergent selection on these two fused chromosomes. There is elevated genetic diversity within regions under selection in the freshwater population, consistent with a simulation study showing that gene flow can increase diversity in genomic regions associated with local adaptation and our demographic models showing gene flow between the marine and freshwater populations. Integrating our results with previous studies, we propose that these fusions created regions of low recombination that enabled the formation of adaptative clusters, thereby facilitating freshwater adaptation in the face of recurrent gene flow between marine and freshwater threespine sticklebacks.
Collapse
Affiliation(s)
- Zuyao Liu
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Marius Roesti
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - David Marques
- Division of Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland.,Department of Fish Ecology and Evolution, Centre for Ecology, Evolution, and Biogeochemistry, Swiss Federal Institute of Aquatic Science and Technology (EAWAG), Kastanienbaum, Switzerland.,Natural History Museum Basel, Basel, Switzerland
| | - Melanie Hiltbrunner
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Verena Saladin
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Catherine L Peichel
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| |
Collapse
|
57
|
Guan D, McCarthy SA, Ning Z, Wang G, Wang Y, Durbin R. Efficient iterative Hi-C scaffolder based on N-best neighbors. BMC Bioinformatics 2021; 22:569. [PMID: 34837944 PMCID: PMC8627104 DOI: 10.1186/s12859-021-04453-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 09/07/2021] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.
Collapse
Affiliation(s)
- Dengfeng Guan
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, 150001, China
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
- Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Zemin Ning
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Guohua Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, 150001, China.
| | - Yadong Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, 150001, China.
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
| |
Collapse
|
58
|
Rahman A, Pachter L. SWALO: scaffolding with assembly likelihood optimization. Nucleic Acids Res 2021; 49:e117. [PMID: 34417615 PMCID: PMC8599790 DOI: 10.1093/nar/gkab717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/16/2021] [Accepted: 08/16/2021] [Indexed: 01/01/2023] Open
Abstract
Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.
Collapse
Affiliation(s)
- Atif Rahman
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA.,Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Lior Pachter
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA.,Departments of Mathematics and Molecular & Cell Biology, University of California, Berkeley, CA 94720, USA.,Departments of Biology and Computing & Mathematical Sciences, California Institute of Technology, Pasadena, CA 91103, USA
| |
Collapse
|
59
|
Lu CW, Yao CT, Hung CM. Domestication obscures genomic estimates of population history. Mol Ecol 2021; 31:752-766. [PMID: 34779057 DOI: 10.1111/mec.16277] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 11/05/2021] [Accepted: 11/08/2021] [Indexed: 11/28/2022]
Abstract
Domesticated species are valuable models to examine phenotypic evolution, and knowledge on domestication history is critical for understanding the trajectories of evolutionary changes. Sequentially Markov Coalescent models are often used to infer domestication history. However, domestication practices may obscure the signal left by population history, affecting demographic inference. Here we assembled the genomes of a recently domesticated species-the society finch-and its parent species-the white-rumped munia-to examine its domestication history. We applied genomic analyses to two society finch breeds and white-rumped munias to test whether domestication of the former resulted from inbreeding or hybridization. The society finch showed longer and more runs of homozygosity and lower genomic heterozygosity than the white-rumped munia, supporting an inbreeding origin in the former. Blocks of white-rumped munia and other ancestry in society finch genomes showed similar genetic distance between the two taxa, inconsistent with the hybridization origin hypothesis. We then applied two Sequentially Markov Coalescent models-psmc and smc++-to infer the demographic histories of both. Surprisingly, the two models did not reveal a recent population bottleneck, but instead the psmc model showed a specious, dramatic population increase in the society finch. Subsequently, we used simulated genomes based on an array of demographic scenarios to demonstrate that recent inbreeding, not hybridization, caused the distorted psmc population trajectory. Such analyses could have misled our understanding of the domestication process. Our findings stress caution when interpreting the histories of recently domesticated species inferred by psmc, arguing that these histories require multiple analyses to validate.
Collapse
Affiliation(s)
- Chia-Wei Lu
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Cheng-Te Yao
- Division of Zoology, Endemic Species Research Institute, Nantou, Taiwan
| | - Chih-Ming Hung
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
60
|
Coombe L, Li JX, Lo T, Wong J, Nikolic V, Warren RL, Birol I. LongStitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics 2021; 22:534. [PMID: 34717540 PMCID: PMC8557608 DOI: 10.1186/s12859-021-04451-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/19/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. RESULTS LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. CONCLUSIONS Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .
Collapse
Affiliation(s)
- Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada.
| | - Janet X Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Theodora Lo
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Vladimir Nikolic
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
61
|
Chakraborty A, Mahajan S, Jaiswal SK, Sharma VK. Genome sequencing of turmeric provides evolutionary insights into its medicinal properties. Commun Biol 2021; 4:1193. [PMID: 34654884 PMCID: PMC8521574 DOI: 10.1038/s42003-021-02720-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 08/13/2021] [Indexed: 12/28/2022] Open
Abstract
Curcuma longa, or turmeric, is traditionally known for its immense medicinal properties and has diverse therapeutic applications. However, the absence of a reference genome sequence is a limiting factor in understanding the genomic basis of the origin of its medicinal properties. In this study, we present the draft genome sequence of C. longa, belonging to Zingiberaceae plant family, constructed using 10x Genomics linked reads and Oxford Nanopore long reads. For comprehensive gene set prediction and for insights into its gene expression, transcriptome sequencing of leaf tissue was also performed. The draft genome assembly had a size of 1.02 Gbp with ~70% repetitive sequences, and contained 50,401 coding gene sequences. The phylogenetic position of C. longa was resolved through a comprehensive genome-wide analysis including 16 other plant species. Using 5,388 orthogroups, the comparative evolutionary analysis performed across 17 species including C. longa revealed evolution in genes associated with secondary metabolism, plant phytohormones signaling, and various biotic and abiotic stress tolerance responses. These mechanisms are crucial for perennial and rhizomatous plants such as C. longa for defense and environmental stress tolerance via production of secondary metabolites, which are associated with the wide range of medicinal properties in C. longa.
Collapse
Affiliation(s)
- Abhisek Chakraborty
- MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Shruti Mahajan
- MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Shubham K Jaiswal
- MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Vineet K Sharma
- MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India.
| |
Collapse
|
62
|
Li K, Jiang W, Hui Y, Kong M, Feng LY, Gao LZ, Li P, Lu S. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. MOLECULAR PLANT 2021; 14:1745-1756. [PMID: 34171481 DOI: 10.1016/j.molp.2021.06.017] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/18/2021] [Accepted: 06/22/2021] [Indexed: 05/04/2023]
Abstract
The ultimate goal of genome assembly is a high-accuracy gapless genome. Here, we report a new assembly pipeline that is used to produce a gapless genome for the indica rice cultivar Minghui 63. The resulting 397.71-Mb final assembly is composed of 12 contigs with a contig N50 size of 31.93 Mb. Each chromosome is represented by a single contig and the genomic sequences of all chromosomes are gapless. Quality evaluation of this gapless genome assembly showed that gene regions in our assembly have the highest completeness compared with the other 15 reported high-quality rice genomes. Further comparison with the japonica rice genome revealed that the gapless indica genome assembly contains more transposable elements (TEs) and segmental duplications (SDs), the latter of which produce many duplicated genes that can affect agronomic traits through dose effect or sub-/neo-functionalization. The insertion of TEs can also affect the expression of duplicated genes, which may drive the evolution of these genes. Furthermore, we found the expansion of nucleotide-binding site with leucine-rich repeat disease-resistance genes and cis-zeatin-O-glucosyltransferase growth-related genes in SDs in the gapless indica genome assembly, suggesting that SDs contribute to the adaptive evolution of rice disease resistance and developmental processes. Collectively, our findings suggest that active TEs and SDs synergistically contribute to rice genome evolution.
Collapse
Affiliation(s)
- Kui Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Wenkai Jiang
- Novogene Bioinformatics Institute, Building 301, Zone A10 Jiuxianqiao North Road, Chaoyang District, Beijing 100083, China
| | - Yuanyuan Hui
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Mengjuan Kong
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Li-Ying Feng
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China
| | - Li-Zhi Gao
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China.
| | - Pengfu Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China.
| | - Shan Lu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; Shenzhen Research Institute of Nanjing University, Shenzhen 518000, China.
| |
Collapse
|
63
|
Morisse P, Lemaitre C, Legeai F. LRez: a C++ API and toolkit for analyzing and managing Linked-Reads data. BIOINFORMATICS ADVANCES 2021; 1:vbab022. [PMID: 36700107 PMCID: PMC9710615 DOI: 10.1093/bioadv/vbab022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/09/2021] [Accepted: 09/20/2021] [Indexed: 01/28/2023]
Abstract
Motivation Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Pierre Morisse
- Univ Rennes, Inria, CNRS, IRISA, Rennes 35000, France,To whom correspondence should be addressed.
| | | | - Fabrice Legeai
- Univ Rennes, Inria, CNRS, IRISA, Rennes 35000, France,IGEPP, INRAE, Institut Agro, Univ Rennes, Rennes 35000, France
| |
Collapse
|
64
|
Freire R, Weisweiler M, Guerreiro R, Baig N, Hüttel B, Obeng-Hinneh E, Renner J, Hartje S, Muders K, Truberg B, Rosen A, Prigge V, Bruckmüller J, Lübeck J, Stich B. Chromosome-scale reference genome assembly of a diploid potato clone derived from an elite variety. G3-GENES GENOMES GENETICS 2021; 11:6371871. [PMID: 34534288 PMCID: PMC8664475 DOI: 10.1093/g3journal/jkab330] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/08/2021] [Indexed: 01/27/2023]
Abstract
Potato (Solanum tuberosum L.) is one of the most important crops with a worldwide production of 370 million metric tons. The objectives of this study were (1) to create a high-quality consensus sequence across the two haplotypes of a diploid clone derived from a tetraploid elite variety and assess the sequence divergence from the available potato genome assemblies, as well as among the two haplotypes; (2) to evaluate the new assembly’s usefulness for various genomic methods; and (3) to assess the performance of phasing in diploid and tetraploid clones, using linked-read sequencing technology. We used PacBio long reads coupled with 10x Genomics reads and proximity ligation scaffolding to create the dAg1_v1.0 reference genome sequence. With a final assembly size of 812 Mb, where 750 Mb are anchored to 12 chromosomes, our assembly is larger than other available potato reference sequences and high proportions of properly paired reads were observed for clones unrelated by pedigree to dAg1. Comparisons of the new dAg1_v1.0 sequence to other potato genome sequences point out the high divergence between the different potato varieties and illustrate the potential of using dAg1_v1.0 sequence in breeding applications.
Collapse
Affiliation(s)
- Ruth Freire
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Ricardo Guerreiro
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Nadia Baig
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Bruno Hüttel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding, Carl-von-Linne-Weg 10, 50829 Köln, Germany
| | - Evelyn Obeng-Hinneh
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Juliane Renner
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Stefanie Hartje
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Katja Muders
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Bernd Truberg
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Arne Rosen
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Vanessa Prigge
- SaKa Pflanzenzucht GmbH & Co. KG, Zuchtstation Windeby, Eichenallee 9, 24340 Windeby, Germany
| | | | - Jens Lübeck
- Solana Research GmbH, Eichenallee 9, 24340 Windeby, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany
| |
Collapse
|
65
|
Sène MA, Kiesslich S, Djambazian H, Ragoussis J, Xia Y, Kamen AA. Haplotype-resolved de novo assembly of the Vero cell line genome. NPJ Vaccines 2021; 6:106. [PMID: 34417462 PMCID: PMC8379168 DOI: 10.1038/s41541-021-00358-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 07/12/2021] [Indexed: 01/13/2023] Open
Abstract
The Vero cell line is the most used continuous cell line for viral vaccine manufacturing with more than 40 years of accumulated experience in the vaccine industry. Additionally, the Vero cell line has shown a high affinity for infection by MERS-CoV, SARS-CoV, and recently SARS-CoV-2, emerging as an important discovery and screening tool to support the global research and development efforts in this COVID-19 pandemic. However, the lack of a reference genome for the Vero cell line has limited our understanding of host–virus interactions underlying such affinity of the Vero cell towards key emerging pathogens, and more importantly our ability to redesign high-yield vaccine production processes using Vero genome editing. In this paper, we present an annotated highly contiguous 2.9 Gb assembly of the Vero cell genome. In addition, several viral genome insertions, including Adeno-associated virus serotypes 3, 4, 7, and 8, have been identified, giving valuable insights into quality control considerations for cell-based vaccine production systems. Variant calling revealed that, in addition to interferon, chemokines, and caspases-related genes lost their functions. Surprisingly, the ACE2 gene, which was previously identified as the host cell entry receptor for SARS-CoV and SARS-CoV-2, also lost function in the Vero genome due to structural variations.
Collapse
Affiliation(s)
| | - Sascha Kiesslich
- Department of Bioengineering, McGill University, Montreal, QC, Canada
| | | | | | - Yu Xia
- Department of Bioengineering, McGill University, Montreal, QC, Canada
| | - Amine A Kamen
- Department of Bioengineering, McGill University, Montreal, QC, Canada.
| |
Collapse
|
66
|
Hiltunen M, Ryberg M, Johannesson H. ARBitR: an overlap-aware genome assembly scaffolder for linked reads. Bioinformatics 2021; 37:2203-2205. [PMID: 33216122 PMCID: PMC8352505 DOI: 10.1093/bioinformatics/btaa975] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 10/22/2020] [Accepted: 11/10/2020] [Indexed: 12/02/2022] Open
Abstract
Summary Linked genomic sequencing reads contain information that can be used to join sequences together into scaffolds in draft genome assemblies. Existing software for this purpose performs the scaffolding by joining sequences with a gap between them, not considering potential overlaps of contigs. We developed ARBitR to create scaffolds where overlaps are taken into account and show that it can accurately recreate regions where draft assemblies are broken. Availability and implementation ARBitR is written and implemented in Python3 for Unix-based operative systems. All source code is available at https://github.com/markhilt/ARBitR under the GNU General Public License v3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Markus Hiltunen
- Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden
| | - Martin Ryberg
- Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden
| | - Hanna Johannesson
- Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden
| |
Collapse
|
67
|
Yang X, Slotte T, Dainat J, Hambäck PA. Genome assemblies of three closely related leaf beetle species (Galerucella spp.). G3 (BETHESDA, MD.) 2021; 11:6307723. [PMID: 34849825 PMCID: PMC8496278 DOI: 10.1093/g3journal/jkab214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/11/2021] [Indexed: 06/13/2023]
Abstract
Galerucella (Coleoptera: Chrysomelidae) is a leaf beetle genus that has been extensively used for ecological and evolutionary studies. It has also been used as biological control agent against invading purple loosestrife in North America, with large effects on biodiversity. Here, we report genome assembly and annotation of three closely related Galerucella species: G. calmariensis, G. pusilla, and G. tenella. The three assemblies have a genome size ranging from 460 to 588 Mbp, with N50 from 31,588 to 79,674 kbp, containing 29,202 to 40,929 scaffolds. Using an ab initio evidence-driven approach, 30,302 to 33,794 protein-coding genes were identified and functionally annotated. These draft genomes will contribute to the understanding of host-parasitoid interactions, evolutionary comparisons of leaf beetle species and future population genomics studies.
Collapse
Affiliation(s)
- Xuyue Yang
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm 10691, Sweden
| | - Tanja Slotte
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm 10691, Sweden
| | - Jacques Dainat
- Department of Medical Biochemistry Microbiology and Genomics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala 75237, Sweden
| | - Peter A Hambäck
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm 10691, Sweden
| |
Collapse
|
68
|
Genome of the world's smallest flowering plant, Wolffia australiana, helps explain its specialized physiology and unique morphology. Commun Biol 2021; 4:900. [PMID: 34294872 PMCID: PMC8298427 DOI: 10.1038/s42003-021-02422-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 06/17/2021] [Indexed: 11/17/2022] Open
Abstract
Watermeal, Wolffia australiana, is the smallest known flowering monocot and is rich in protein. Despite its great potential as a biotech crop, basic research on Wolffia is in its infancy. Here, we generated the reference genome of a species of watermeal, W. australiana, and identified the genome-wide features that may contribute to its atypical anatomy and physiology, including the absence of roots, adaxial stomata development, and anaerobic life as a turion. In addition, we found evidence of extensive genome rearrangements that may underpin the specialized aquatic lifestyle of watermeal. Analysis of the gene inventory of this intriguing species helps explain the distinct characteristics of W. australiana and its unique evolutionary trajectory. Halim Park and Jin Hwa Park et al. report the nuclear genome sequence of the duckweed Wolffia australiana, the smallest known flowering plant. The genome assembly represents an improvement over a recently published genome and highlights genome rearrangements that may be linked to its specialized aquatic adaptations.
Collapse
|
69
|
Kim SH, Lee SJ, Jo E, Kim J, Kim JU, Kim JH, Park H, Chi YM. Genome of the Southern Giant Petrel Assembled Using Third-Generation DNA Sequencing and Linked Reads Reveals Evolutionary Traits of Southern Avian. Animals (Basel) 2021; 11:ani11072046. [PMID: 34359174 PMCID: PMC8300169 DOI: 10.3390/ani11072046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/06/2021] [Accepted: 07/08/2021] [Indexed: 11/16/2022] Open
Abstract
The southern giant petrel Macronectes giganteus, a large seabird of the southern oceans, is one of only two members of the genus Macronectes and is the largest species in the order Procellariiformes. Although these two families account for the vast majority of the avian fauna inhabiting the Antarctic and sub-Antarctic regions, studies on the status of some populations and the associated genetic data are currently extremely limited. In this study, we assembled the genome of M. giganteus by integrating Pacific Biosciences single-molecule real-time sequencing and the Chromium system developed by 10x Genomics. The final M. giganteus genome assembly was 1.248 Gb in size with a scaffold N50 length of 27.4 Mb and a longest scaffold length of 120.4 Mb. The M. giganteus genome contains 14,993 predicted protein-coding genes and has 11.06% repeat sequences. Estimated historical effective population size analysis indicated that the southern giant petrel underwent a severe reduction in effective population size during a period coinciding with the early Pleistocene. The availability of this newly sequenced genome will facilitate more effective genetic monitoring of threatened species. Furthermore, the genome will provide a valuable resource for gene functional studies and further comparative genomic studies on the life history and ecological traits of specific avian species.
Collapse
Affiliation(s)
- Sun-Hee Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea; (S.-H.K.); (S.-J.L.); (E.J.); (J.K.)
- Greenwitch Co., 20, Jeungpyeong 2 Sandan-ro, Doan-myeon, Jeungpyeong-gun 27902, Korea
| | - Seung-Jae Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea; (S.-H.K.); (S.-J.L.); (E.J.); (J.K.)
| | - Euna Jo
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea; (S.-H.K.); (S.-J.L.); (E.J.); (J.K.)
- Division of Life Sciences, Korea Polar Research Institute (KOPRI), Yeonsu-gu, Incheon 21990, Korea;
| | - Jangyeon Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea; (S.-H.K.); (S.-J.L.); (E.J.); (J.K.)
| | - Jong-U Kim
- Division of Life Sciences, Korea Polar Research Institute (KOPRI), Yeonsu-gu, Incheon 21990, Korea;
| | - Jeong-Hoon Kim
- Division of Life Sciences, Korea Polar Research Institute (KOPRI), Yeonsu-gu, Incheon 21990, Korea;
- Correspondence: (J.-H.K.); (H.P.); (Y.-M.C.); Tel.: +82-32-760-5513 (J.-H.K.); +82-2-3290-3051 (H.P.); +82-2-3290-3025 (Y.-M.C.)
| | - Hyun Park
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea; (S.-H.K.); (S.-J.L.); (E.J.); (J.K.)
- Correspondence: (J.-H.K.); (H.P.); (Y.-M.C.); Tel.: +82-32-760-5513 (J.-H.K.); +82-2-3290-3051 (H.P.); +82-2-3290-3025 (Y.-M.C.)
| | - Young-Min Chi
- Department of Biotechnology, College of Life Sciences and Biotechnology, Korea University, Seoul 02841, Korea; (S.-H.K.); (S.-J.L.); (E.J.); (J.K.)
- Correspondence: (J.-H.K.); (H.P.); (Y.-M.C.); Tel.: +82-32-760-5513 (J.-H.K.); +82-2-3290-3051 (H.P.); +82-2-3290-3025 (Y.-M.C.)
| |
Collapse
|
70
|
Duckett DJ, Sullivan J, Pirro S, Carstens BC. Genomic Resources for the North American Water Vole ( Microtus richardsoni) and the Montane Vole ( Microtus montanus). GIGABYTE 2021; 2021:gigabyte19. [PMID: 36824326 PMCID: PMC9631978 DOI: 10.46471/gigabyte.19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 05/04/2021] [Indexed: 11/09/2022] Open
Abstract
Voles of the genus Microtus are important research organisms, yet genomic resources are lacking. Such resources would benefit future studies of immunology, phylogeography, cryptic diversity, and more. We sequenced and assembled nuclear genomes from two subspecies of water vole (Microtus richardsoni) and from the montane vole (Microtus montanus). The water vole genomes were sequenced with Illumina and 10× Chromium plus Illumina sequencing, resulting in assemblies with ∼1600,000 and ∼30,000 scaffolds, respectively. The montane vole was also assembled into ∼13,000 scaffolds using Illumina sequencing. Mitochondrial genome assemblies were also performed for both species. Structural and functional annotation for the best water vole nuclear genome resulted in ∼24,500 annotated genes, with 83% of these having functional annotations. Assembly quality statistics for our nuclear assemblies fall within the range of genomes previously published in the genus Microtus, making the water vole and montane vole genomes useful additions to currently available genomic resources.
Collapse
Affiliation(s)
- Drew J. Duckett
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 1315 Kinnear Rd., Columbus, OH 43212, USA
| | - Jack Sullivan
- Department of Biological Sciences, University of Idaho, Box 443051, Moscow, ID 83844-3051, USA
| | - Stacy Pirro
- Iridian Genomes, Inc., 6213 Swords Way, Bethesda, MD 20817, USA
| | - Bryan C. Carstens
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 1315 Kinnear Rd., Columbus, OH 43212, USA
| |
Collapse
|
71
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
72
|
Gavrielatos M, Kyriakidis K, Spandidos DA, Michalopoulos I. Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Mol Med Rep 2021; 23:251. [PMID: 33537807 PMCID: PMC7893683 DOI: 10.3892/mmr.2021.11890] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/21/2021] [Indexed: 12/30/2022] Open
Abstract
Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel de novo genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired‑end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long‑read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi‑C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi‑C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.
Collapse
Affiliation(s)
- Marios Gavrielatos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, 15701 Athens, Greece
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTh), 54124 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Centre for Interdisciplinary Research and Innovation, 57001 Thessaloniki, Greece
| | - Demetrios A. Spandidos
- Laboratory of Clinical Virology, Medical School, University of Crete, 71003 Heraklion, Greece
| | - Ioannis Michalopoulos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
| |
Collapse
|
73
|
Guo L, Xu M, Wang W, Gu S, Zhao X, Chen F, Wang O, Xu X, Seim I, Fan G, Deng L, Liu X. SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme. BMC Bioinformatics 2021; 22:158. [PMID: 33765921 PMCID: PMC7993450 DOI: 10.1186/s12859-021-04081-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 03/16/2021] [Indexed: 12/30/2022] Open
Abstract
Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04081-z.
Collapse
Affiliation(s)
- Lidong Guo
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Mengyang Xu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Wenchao Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China
| | - Shengqiang Gu
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China
| | - Xia Zhao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Fang Chen
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Ou Wang
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Inge Seim
- Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, 210046, China.,School of Biology and Environmental Science, Queensland University of Technology, Brisbane, 4000, Australia
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Li Deng
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. .,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| | - Xin Liu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. .,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| |
Collapse
|
74
|
Mathers TC, Wouters RHM, Mugford ST, Swarbreck D, van Oosterhout C, Hogenhout SA. Chromosome-Scale Genome Assemblies of Aphids Reveal Extensively Rearranged Autosomes and Long-Term Conservation of the X Chromosome. Mol Biol Evol 2021; 38:856-875. [PMID: 32966576 PMCID: PMC7947777 DOI: 10.1093/molbev/msaa246] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Chromosome rearrangements are arguably the most dramatic type of mutations, often leading to rapid evolution and speciation. However, chromosome dynamics have only been studied at the sequence level in a small number of model systems. In insects, Diptera and Lepidoptera have conserved genome structure at the scale of whole chromosomes or chromosome arms. Whether this reflects the diversity of insect genome evolution is questionable given that many species exhibit rapid karyotype evolution. Here, we investigate chromosome evolution in aphids-an important group of hemipteran plant pests-using newly generated chromosome-scale genome assemblies of the green peach aphid (Myzus persicae) and the pea aphid (Acyrthosiphon pisum), and a previously published assembly of the corn-leaf aphid (Rhopalosiphum maidis). We find that aphid autosomes have undergone dramatic reorganization over the last 30 My, to the extent that chromosome homology cannot be determined between aphids from the tribes Macrosiphini (Myzus persicae and Acyrthosiphon pisum) and Aphidini (Rhopalosiphum maidis). In contrast, gene content of the aphid sex (X) chromosome remained unchanged despite rapid sequence evolution, low gene expression, and high transposable element load. To test whether rapid evolution of genome structure is a hallmark of Hemiptera, we compared our aphid assemblies with chromosome-scale assemblies of two blood-feeding Hemiptera (Rhodnius prolixus and Triatoma rubrofasciata). Despite being more diverged, the blood-feeding hemipterans have conserved synteny. The exceptional rate of structural evolution of aphid autosomes renders them an important emerging model system for studying the role of large-scale genome rearrangements in evolution.
Collapse
Affiliation(s)
- Thomas C Mathers
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| | - Roland H M Wouters
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| | - Sam T Mugford
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| | - David Swarbreck
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Cock van Oosterhout
- School of Environmental Sciences, University of East Anglia, Norwich, United Kingdom
| | - Saskia A Hogenhout
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| |
Collapse
|
75
|
Smith CH. A High-Quality Reference Genome for a Parasitic Bivalve with Doubly Uniparental Inheritance (Bivalvia: Unionida). Genome Biol Evol 2021; 13:evab029. [PMID: 33570560 PMCID: PMC7937423 DOI: 10.1093/gbe/evab029] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/08/2021] [Indexed: 12/16/2022] Open
Abstract
From a genomics perspective, bivalves (Mollusca: Bivalvia) have been poorly explored with the exception for those of high economic value. The bivalve order Unionida, or freshwater mussels, has been of interest in recent genomic studies due to their unique mitochondrial biology and peculiar life cycle. However, genomic studies have been hindered by the lack of a high-quality reference genome. Here, I present a genome assembly of Potamilus streckersoni using Pacific Bioscience single-molecule real-time long reads and 10X Genomics-linked read sequencing. Further, I use RNA sequencing from multiple tissue types and life stages to annotate the reference genome. The final assembly was far superior to any previously published freshwater mussel genome and was represented by 2,368 scaffolds (2,472 contigs) and 1,776,755,624 bp, with a scaffold N50 of 2,051,244 bp. A high proportion of the assembly was comprised of repetitive elements (51.03%), aligning with genomic characteristics of other bivalves. The functional annotation returned 52,407 gene models (41,065 protein, 11,342 tRNAs), which was concordant with the estimated number of genes in other freshwater mussel species. This genetic resource, along with future studies developing high-quality genome assemblies and annotations, will be integral toward unraveling the genomic bases of ecologically and evolutionarily important traits in this hyper-diverse group.
Collapse
Affiliation(s)
- Chase H Smith
- Department of Integrative Biology, University of Texas, Austin, Texas, USA
- Biology Department, Baylor University, Waco, Texas, USA
| |
Collapse
|
76
|
Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. A comprehensive review of scaffolding methods in genome assembly. Brief Bioinform 2021; 22:6149347. [PMID: 33634311 DOI: 10.1093/bib/bbab033] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open
Abstract
In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Yawei Wei
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Mengna Lyu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Zhengjiang Wu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
77
|
Wang C, Wallerman O, Arendt ML, Sundström E, Karlsson Å, Nordin J, Mäkeläinen S, Pielberg GR, Hanson J, Ohlsson Å, Saellström S, Rönnberg H, Ljungvall I, Häggström J, Bergström TF, Hedhammar Å, Meadows JRS, Lindblad-Toh K. A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun Biol 2021; 4:185. [PMID: 33568770 PMCID: PMC7875987 DOI: 10.1038/s42003-021-01698-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/17/2020] [Indexed: 12/13/2022] Open
Abstract
We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine "dark" regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.
Collapse
Affiliation(s)
- Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | - Ola Wallerman
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Maja-Louise Arendt
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Veterinary Clinical Sciences, University of Copenhagen, Frederiksberg D, Denmark
| | - Elisabeth Sundström
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Åsa Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessika Nordin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Suvi Mäkeläinen
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Gerli Rosengren Pielberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jeanette Hanson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åsa Ohlsson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Sara Saellström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Henrik Rönnberg
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ingrid Ljungvall
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jens Häggström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Tomas F Bergström
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åke Hedhammar
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
78
|
|
79
|
Gene Sequence Assembly Algorithm Model Based on the DBG Strategy and Its Application. JOURNAL OF HEALTHCARE ENGINEERING 2021. [DOI: 10.1155/2021/6676194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the continuous development of sequencing technology, the amount of bioinformatics data has increased geometrically, and the massive amount of bioinformatics data puts forward more stringent requirements for sequence assembly problems. The sequence assembly algorithm based on DBG (De Bruijn graph) strategy is a key algorithm in bioinformatics, which is widely used in the domain of gene sequence assembly. Current research on the domain of sequence assembly always focuses on optimization of specific steps to a specific algorithm and lack of research on domain-level high-abstract algorithm frameworks. To some extent, it leads to the redundancy of the sequence assembly algorithm, and some problems may be caused by the artificial selection algorithm. This paper analyzes the domain of DBGSA and establishes a feature model of this domain. Based on the production programming method, the DBGSA algorithm component is interactively designed. With the support of the PAR platform, the DBGSA algorithm component library is formally implemented, and furthermore, the DBGSA component library is used to assemble the specific algorithm. This research adds domain-level research to the domain of sequence assembly and implements the DBGSA component library, which can assemble specific sequence assembly algorithms, ensuring the efficiency of algorithm development and the reliability of assembly generation algorithms. At the same time, it also provides a valuable reference for solving problems in the domain of sequence assembly.
Collapse
|
80
|
Hruska JP, Manthey JD. De novo assembly of a chromosome-scale reference genome for the northern flicker Colaptes auratus. G3 (BETHESDA, MD.) 2021; 11:jkaa026. [PMID: 33561233 PMCID: PMC8022726 DOI: 10.1093/g3journal/jkaa026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 11/12/2020] [Indexed: 11/15/2022]
Abstract
The northern flicker, Colaptes auratus, is a widely distributed North American woodpecker and a long-standing focal species for the study of ecology, behavior, phenotypic differentiation, and hybridization. We present here a highly contiguous de novo genome assembly of C. auratus, the first such assembly for the species and the first published chromosome-level assembly for woodpeckers (Picidae). The assembly was generated using a combination of short-read Chromium 10× and long-read PacBio sequencing, and further scaffolded with chromatin conformation capture (Hi-C) reads. The resulting genome assembly is 1.378 Gb in size, with a scaffold N50 of 11 and a scaffold L50 of 43.948 Mb. This assembly contains 87.4-91.7% of genes present across four sets of universal single-copy orthologs found in tetrapods and birds. We annotated the assembly both for genes and repetitive content, identifying 18,745 genes and a prevalence of ∼28.0% repetitive elements. Lastly, we used fourfold degenerate sites from neutrally evolving genes to estimate a mutation rate for C. auratus, which we estimated to be 4.007 × 10-9 substitutions/site/year, about 1.5× times faster than an earlier mutation rate estimate of the family. The highly contiguous assembly and annotations we report will serve as a resource for future studies on the genomics of C. auratus and comparative evolution of woodpeckers.
Collapse
Affiliation(s)
- Jack P Hruska
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409-43131, USA
| | - Joseph D Manthey
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409-43131, USA
| |
Collapse
|
81
|
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, Liachko I, Haryoko T, Jønsson KA, Zhou Q, Irestedt M, Suh A. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour 2021; 21:263-286. [PMID: 32937018 PMCID: PMC7757076 DOI: 10.1111/1755-0998.13252] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/21/2020] [Accepted: 08/26/2020] [Indexed: 01/09/2023]
Abstract
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
- Museum für NaturkundeLeibniz Institut für Evolutions‐ und BiodiversitätsforschungBerlinGermany
| | - Luohao Xu
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
| | - Reto Burri
- Department of Population EcologyInstitute of Ecology and EvolutionFriedrich‐Schiller‐University JenaJenaGermany
| | | | - Ignas Bunikis
- Department of Immunology, Genetics and PathologyScience for Life LaboratoryUppsala Genome CenterUppsala UniversityUppsalaSweden
| | | | - Tri Haryoko
- Research Centre for BiologyMuseum Zoologicum BogorienseIndonesian Institute of Sciences (LIPI)CibinongIndonesia
| | - Knud A. Jønsson
- Natural History Museum of DenmarkUniversity of CopenhagenCopenhagenDenmark
| | - Qi Zhou
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
- MOE Laboratory of Biosystems Homeostasis & ProtectionLife Sciences InstituteZhejiang UniversityHangzhouChina
- Center for Reproductive MedicineThe 2nd Affiliated HospitalSchool of MedicineZhejiang UniversityHangzhouChina
| | - Martin Irestedt
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
| | - Alexander Suh
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- School of Biological Sciences—Organisms and the EnvironmentUniversity of East AngliaNorwichUK
| |
Collapse
|
82
|
Doyle SR, Tracey A, Laing R, Holroyd N, Bartley D, Bazant W, Beasley H, Beech R, Britton C, Brooks K, Chaudhry U, Maitland K, Martinelli A, Noonan JD, Paulini M, Quail MA, Redman E, Rodgers FH, Sallé G, Shabbir MZ, Sankaranarayanan G, Wit J, Howe KL, Sargison N, Devaney E, Berriman M, Gilleard JS, Cotton JA. Genomic and transcriptomic variation defines the chromosome-scale assembly of Haemonchus contortus, a model gastrointestinal worm. Commun Biol 2020; 3:656. [PMID: 33168940 PMCID: PMC7652881 DOI: 10.1038/s42003-020-01377-3] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 10/14/2020] [Indexed: 12/31/2022] Open
Abstract
Haemonchus contortus is a globally distributed and economically important gastrointestinal pathogen of small ruminants and has become a key nematode model for studying anthelmintic resistance and other parasite-specific traits among a wider group of parasites including major human pathogens. Here, we report using PacBio long-read and OpGen and 10X Genomics long-molecule methods to generate a highly contiguous 283.4 Mbp chromosome-scale genome assembly including a resolved sex chromosome for the MHco3(ISE).N1 isolate. We show a remarkable pattern of conservation of chromosome content with Caenorhabditis elegans, but almost no conservation of gene order. Short and long-read transcriptome sequencing allowed us to define coordinated transcriptional regulation throughout the parasite's life cycle and refine our understanding of cis- and trans-splicing. Finally, we provide a comprehensive picture of chromosome-wide genetic diversity both within a single isolate and globally. These data provide a high-quality comparison for understanding the evolution and genomics of Caenorhabditis and other nematodes and extend the experimental tractability of this model parasitic nematode in understanding helminth biology, drug discovery and vaccine development, as well as important adaptive traits such as drug resistance.
Collapse
Affiliation(s)
- Stephen R Doyle
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | - Alan Tracey
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Roz Laing
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Garscube Campus, Glasgow, G61 1QH, UK
| | - Nancy Holroyd
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - David Bartley
- Moredun Research Institute, Pentlands Science Park, Bush Loan, Penicuik, EH26 0PZ, UK
| | - Wojtek Bazant
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Helen Beasley
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Robin Beech
- Institute of Parasitology, McGill University, 21111 Lakeshore Road, Sainte Anne-de-Bellevue, QC, H9X3V9, Canada
| | - Collette Britton
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Garscube Campus, Glasgow, G61 1QH, UK
| | - Karen Brooks
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Umer Chaudhry
- Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, EH25 9RG, UK
| | - Kirsty Maitland
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Garscube Campus, Glasgow, G61 1QH, UK
| | - Axel Martinelli
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Jennifer D Noonan
- Institute of Parasitology, McGill University, 21111 Lakeshore Road, Sainte Anne-de-Bellevue, QC, H9X3V9, Canada
| | - Michael Paulini
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Michael A Quail
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Elizabeth Redman
- Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - Faye H Rodgers
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Guillaume Sallé
- INRAE - U. Tours, UMR 1282 ISP Infectiologie et Santé Publique, Centre de recherche Val de Loire, Nouzilly, France
| | | | | | - Janneke Wit
- Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - Kevin L Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Neil Sargison
- Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, EH25 9RG, UK
| | - Eileen Devaney
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Garscube Campus, Glasgow, G61 1QH, UK
| | - Matthew Berriman
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - John S Gilleard
- Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - James A Cotton
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
| |
Collapse
|
83
|
The sockeye salmon genome, transcriptome, and analyses identifying population defining regions of the genome. PLoS One 2020; 15:e0240935. [PMID: 33119641 PMCID: PMC7595290 DOI: 10.1371/journal.pone.0240935] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 10/06/2020] [Indexed: 12/12/2022] Open
Abstract
Sockeye salmon (Oncorhynchus nerka) is a commercially and culturally important species to the people that live along the northern Pacific Ocean coast. There are two main sockeye salmon ecotypes—the ocean-going (anadromous) ecotype and the fresh-water ecotype known as kokanee. The goal of this study was to better understand the population structure of sockeye salmon and identify possible genomic differences among populations and between the two ecotypes. In pursuit of this goal, we generated the first reference sockeye salmon genome assembly and an RNA-seq transcriptome data set to better annotate features of the assembly. Resequenced whole-genomes of 140 sockeye salmon and kokanee were analyzed to understand population structure and identify genomic differences between ecotypes. Three distinct geographic and genetic groups were identified from analyses of the resequencing data. Nucleotide variants in an immunoglobulin heavy chain variable gene cluster on chromosome 26 were found to differentiate the northwestern group from the southern and upper Columbia River groups. Several candidate genes were found to be associated with the kokanee ecotype. Many of these genes were related to ammonia tolerance or vision. Finally, the sex chromosomes of this species were better characterized, and an alternative sex-determination mechanism was identified in a subset of upper Columbia River kokanee.
Collapse
|
84
|
Biello R, Singh A, Godfrey CJ, Fernández FF, Mugford ST, Powell G, Hogenhout SA, Mathers TC. A chromosome-level genome assembly of the woolly apple aphid, Eriosoma lanigerum Hausmann (Hemiptera: Aphididae). Mol Ecol Resour 2020; 21:316-326. [PMID: 32985768 DOI: 10.1111/1755-0998.13258] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 08/21/2020] [Accepted: 09/07/2020] [Indexed: 01/18/2023]
Abstract
Woolly apple aphid (WAA, Eriosoma lanigerum Hausmann) (Hemiptera: Aphididae) is a major pest of apple trees (Malus domestica, order Rosales) and is critical to the economics of the apple industry in most parts of the world. Here, we generated a chromosome-level genome assembly of WAA-representing the first genome sequence from the aphid subfamily Eriosomatinae-using a combination of 10X Genomics linked-reads and in vivo Hi-C data. The final genome assembly is 327 Mb, with 91% of the assembled sequences anchored into six chromosomes. The contig and scaffold N50 values are 158 kb and 71 Mb, respectively, and we predicted a total of 28,186 protein-coding genes. The assembly is highly complete, including 97% of conserved arthropod single-copy orthologues based on Benchmarking Universal Single-Copy Orthologs (busco) analysis. Phylogenomic analysis of WAA and nine previously published aphid genomes, spanning four aphid tribes and three subfamilies, reveals that the tribe Eriosomatini (represented by WAA) is recovered as a sister group to Aphidini + Macrosiphini (subfamily Aphidinae). We identified syntenic blocks of genes between our WAA assembly and the genomes of other aphid species and find that two WAA chromosomes (El5 and El6) map to the conserved Macrosiphini and Aphidini X chromosome. Our high-quality WAA genome assembly and annotation provides a valuable resource for research in a broad range of areas such as comparative and population genomics, insect-plant interactions and pest resistance management.
Collapse
Affiliation(s)
- Roberto Biello
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Archana Singh
- Earlham Institute, John Innes Centre, Norwich Research Park, Norwich, UK
| | | | | | - Sam T Mugford
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| | | | - Saskia A Hogenhout
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| | - Thomas C Mathers
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, UK
| |
Collapse
|
85
|
Zhou Q, Tang D, Huang W, Yang Z, Zhang Y, Hamilton JP, Visser RGF, Bachem CWB, Robin Buell C, Zhang Z, Zhang C, Huang S. Haplotype-resolved genome analyses of a heterozygous diploid potato. Nat Genet 2020; 52:1018-1023. [PMID: 32989320 PMCID: PMC7527274 DOI: 10.1038/s41588-020-0699-x] [Citation(s) in RCA: 135] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 08/24/2020] [Indexed: 02/07/2023]
Abstract
Potato (Solanum tuberosum L.) is the most important tuber crop worldwide. Efforts are underway to transform the crop from a clonally propagated tetraploid into a seed-propagated, inbred-line-based hybrid, but this process requires a better understanding of potato genome. Here, we report the 1.67-Gb haplotype-resolved assembly of a diploid potato, RH89-039-16, using a combination of multiple sequencing strategies, including circular consensus sequencing. Comparison of the two haplotypes revealed ~2.1% intragenomic diversity, including 22,134 predicted deleterious mutations in 10,642 annotated genes. In 20,583 pairs of allelic genes, 16.6% and 30.8% exhibited differential expression and methylation between alleles, respectively. Deleterious mutations and differentially expressed alleles were dispersed throughout both haplotypes, complicating strategies to eradicate deleterious alleles or stack beneficial alleles via meiotic recombination. This study offers a holistic view of the genome organization of a clonally propagated diploid species and provides insights into technological evolution in resolving complex genomes.
Collapse
Affiliation(s)
- Qian Zhou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Peng Cheng Laboratory, Shenzhen, China
| | - Dié Tang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wu Huang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhongmin Yang
- College of Horticulture, Northwest Agriculture and Forest University, Yangling, China
| | - Yu Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - John P Hamilton
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
| | - Richard G F Visser
- Plant Breeding, Wageningen University and Research, Wageningen, the Netherlands
| | | | - C Robin Buell
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
| | - Zhonghua Zhang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
- College of Horticulture, Qingdao Agricultural University, Qingdao, China
| | - Chunzhi Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
86
|
Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 2020; 36:2253-2255. [PMID: 31778144 DOI: 10.1093/bioinformatics/btz891] [Citation(s) in RCA: 696] [Impact Index Per Article: 139.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 10/07/2019] [Accepted: 11/26/2019] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION Although long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors. RESULTS When evaluated for the speed and efficiency using human and a plant (Arabidopsis thaliana) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy. AVAILABILITY AND IMPLEMENTATION NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiang Hu
- GrandOmics Biosciences, Beijing, 102200, China
| | - Junpeng Fan
- GrandOmics Biosciences, Beijing, 102200, China
| | - Zongyi Sun
- GrandOmics Biosciences, Beijing, 102200, China
| | - Shanlin Liu
- GrandOmics Biosciences, Beijing, 102200, China
| |
Collapse
|
87
|
Fuller ZL, Mocellin VJL, Morris LA, Cantin N, Shepherd J, Sarre L, Peng J, Liao Y, Pickrell J, Andolfatto P, Matz M, Bay LK, Przeworski M. Population genetics of the coral Acropora millepora: Toward genomic prediction of bleaching. Science 2020; 369:369/6501/eaba4674. [PMID: 32675347 DOI: 10.1126/science.aba4674] [Citation(s) in RCA: 138] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 06/01/2020] [Indexed: 12/11/2022]
Abstract
Although reef-building corals are declining worldwide, responses to bleaching vary within and across species and are partly heritable. Toward predicting bleaching response from genomic data, we generated a chromosome-scale genome assembly for the coral Acropora millepora We obtained whole-genome sequences for 237 phenotyped samples collected at 12 reefs along the Great Barrier Reef, among which we inferred little population structure. Scanning the genome for evidence of local adaptation, we detected signatures of long-term balancing selection in the heat-shock co-chaperone sacsin We conducted a genome-wide association study of visual bleaching score for 213 samples, incorporating the polygenic score derived from it into a predictive model for bleaching in the wild. These results set the stage for genomics-based approaches in conservation strategies.
Collapse
Affiliation(s)
- Zachary L Fuller
- Department of Biological Sciences, Columbia University, New York, NY, USA.
| | | | - Luke A Morris
- Australian Institute of Marine Science, Townsville, QLD, Australia.,AIMS@JCU, Australian Institute of Marine Science, College of Science and Engineering, James Cook University, Townsville, QLD, Australia.,College of Science and Engineering, James Cook University, Townsville, QLD, Australia
| | - Neal Cantin
- Australian Institute of Marine Science, Townsville, QLD, Australia
| | - Jihanne Shepherd
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Luke Sarre
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Julie Peng
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Yi Liao
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA.,Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | | | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Mikhail Matz
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Line K Bay
- Australian Institute of Marine Science, Townsville, QLD, Australia.
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, NY, USA. .,Department of Systems Biology, Columbia University, New York, NY, USA.,Program for Mathematical Genomics, Columbia University, New York, NY, USA
| |
Collapse
|
88
|
Colella JP, Tigano A, MacManes MD. A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues. Mol Ecol Resour 2020; 20:856-870. [PMID: 32153100 PMCID: PMC7496956 DOI: 10.1111/1755-0998.13155] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 03/03/2020] [Accepted: 03/06/2020] [Indexed: 12/20/2022]
Abstract
High-throughput sequencing technologies are a proposed solution for accessing the molecular data in historical specimens. However, degraded DNA combined with the computational demands of short-read assemblies has posed significant laboratory and bioinformatics challenges for de novo genome assembly. Linked-read or "synthetic long-read" sequencing technologies, such as 10× Genomics, may provide a cost-effective alternative solution to assemble higher quality de novo genomes from degraded tissue samples. Here, we compare assembly quality (e.g., genome contiguity and completeness, presence of orthogroups) between four new deer mouse (Peromyscus spp.) genomes assembled using linked-read technology and four published genomes assembled from a single shotgun library. At a similar price-point, these approaches produce vastly different assemblies, with linked-read assemblies having overall higher contiguity and completeness, measured by larger N50 values and greater number of genes assembled, respectively. As a proof-of-concept, we used annotated genes from the four Peromyscus linked-read assemblies and eight additional rodent taxa to generate a phylogeny, which reconstructed the expected relationships among species with 100% support. Although not without caveats, our results suggest that linked-read sequencing approaches are a viable option to build de novo genomes from degraded tissues, which may prove particularly valuable for taxa that are extinct, rare or difficult to collect.
Collapse
Affiliation(s)
- Jocelyn P. Colella
- Molecular, Cellular, and Biomedical Sciences DepartmentUniversity of New HampshireDurhamNHUSA
- Hubbard Center for Genome StudiesUniversity of New HampshireDurhamNHUSA
| | - Anna Tigano
- Molecular, Cellular, and Biomedical Sciences DepartmentUniversity of New HampshireDurhamNHUSA
- Hubbard Center for Genome StudiesUniversity of New HampshireDurhamNHUSA
| | - Matthew D. MacManes
- Molecular, Cellular, and Biomedical Sciences DepartmentUniversity of New HampshireDurhamNHUSA
- Hubbard Center for Genome StudiesUniversity of New HampshireDurhamNHUSA
| |
Collapse
|
89
|
Tolstoganov I, Bankevich A, Chen Z, Pevzner PA. cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs. Bioinformatics 2020; 35:i61-i70. [PMID: 31510642 PMCID: PMC6612831 DOI: 10.1093/bioinformatics/btz349] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly. Results We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed. Availability and implementation Source code and installation manual for cloudSPAdes are available at https://github.com/ablab/spades/releases/tag/cloudspades-paper. Supplementary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ivan Tolstoganov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Anton Bankevich
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, USA
| | - Zhoutao Chen
- Universal Sequencing Technology Corporation, Carlsbad, CA, USA
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.,Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, USA
| |
Collapse
|
90
|
Peng J, Li Q, Xu L, Wei P, He P, Zhang X, Zhang L, Guan J, Zhang X, Lin Y, Gui J, Chen X. Chromosome-level analysis of the Crassostrea hongkongensis genome reveals extensive duplication of immune-related genes in bivalves. Mol Ecol Resour 2020; 20:980-994. [PMID: 32198971 DOI: 10.1111/1755-0998.13157] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 03/09/2020] [Accepted: 03/16/2020] [Indexed: 12/30/2022]
Abstract
Crassostrea hongkongensis is a popular and important native oyster species that is cultured mainly along the coast of the South China Sea. However, the absence of a reference genome has restricted genetic studies and the development of molecular breeding schemes for this species. Here, we combined PacBio and 10 × Genomics technologies to create a C. hongkongensis genome assembly, which has a size of 610 Mb, and is close to that estimated by flow cytometry (~650 Mb). Contig and scaffold N50 are 2.57 and 4.99 Mb, respectively, and BUSCO analysis indicates that 95.8% of metazoan conserved genes are completely represented. Using a high-density linkage map of its closest related species, C. gigas, a total of 521 Mb (85.4%) was anchored to 10 haploid chromosomes. Comparative genomic analyses with other molluscs reveal that several immune- or stress response-related genes extensively expanded in bivalves by tandem duplication, including C1q, Toll-like receptors and Hsp70, which are associated with their adaptation to filter-feeding and sessile lifestyles in shallow sea and/or deep-sea ecosystems. Through transcriptome sequencing, potential genes and pathways related to sex determination and gonad development were identified. The genome and transcriptome of C. hongkongensis provide valuable resources for future molecular studies, genetic improvement and genome-assisted breeding of oysters.
Collapse
Affiliation(s)
- Jinxia Peng
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Qiongzhen Li
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Lian Xu
- Key Laboratory of Neuroregeneration of Jiangsu and Ministry of Education, Co-innovation Center of Neuroregeneration, Nantong University, Nantong, China
| | - Pinyuan Wei
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Pingping He
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Xingzhi Zhang
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Li Zhang
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Junliang Guan
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Xiaojuan Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology and Innovation Academy for Seed Design, CAS, Wuhan, China
| | - Yong Lin
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| | - Jianfang Gui
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology and Innovation Academy for Seed Design, CAS, Wuhan, China
| | - Xiaohan Chen
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fisheries Sciences, Nanning, China
| |
Collapse
|
91
|
Field MA, Rosen BD, Dudchenko O, Chan EKF, Minoche AE, Edwards RJ, Barton K, Lyons RJ, Tuipulotu DE, Hayes VM, D. Omer A, Colaric Z, Keilwagen J, Skvortsova K, Bogdanovic O, Smith MA, Aiden EL, Smith TPL, Zammit RA, Ballard JWO. Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. Gigascience 2020; 9:giaa027. [PMID: 32236524 PMCID: PMC7111595 DOI: 10.1093/gigascience/giaa027] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 01/29/2020] [Accepted: 02/20/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties. FINDINGS Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies. We generated this improved canid reference genome (CanFam_GSD) utilizing a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is ∼80 times as contiguous as the current canid reference genome (20.9 vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFamv3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. BUSCO analyses of the genome assembly results show that 93.0% of the conserved single-copy genes are complete in the GSD assembly compared with 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to ∼99%. Detailed examination of the evolutionarily important pancreatic amylase region reveals that there are most likely 7 copies of the gene, indicative of a duplication of 4 ancestral copies and the disruption of 1 copy. CONCLUSIONS GSD genome assembly and annotation were produced with major improvement in completeness, continuity, and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.
Collapse
Affiliation(s)
- Matt A Field
- Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Smithfield Road, Cairns, QLD 4878, Australia
- John Curtin School of Medical Research, Australian National University, Garran Rd, Canberra, ACT 2600, Australia
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Agricultural Research Service USDA, Baltimore Ave, Beltsville, MD 20705, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Baylor Plaza, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Main St, Houston, TX 77005, USA
- Center for Theoretical and Biological Physics, Rice University, Main St, Houston, TX 77005, USA
| | - Eva K F Chan
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
- Faculty of Medicine, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| | - Andre E Minoche
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
- St Vincent’s Clinical School, University of New South Wales Sydney, Victoria Street, Darlinghurst NSW 2010, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| | - Kirston Barton
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
- Faculty of Medicine, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| | - Ruth J Lyons
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
| | - Daniel Enosi Tuipulotu
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| | - Vanessa M Hayes
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
- Faculty of Medicine, UNSW Sydney, High St, Kensington, NSW 2052, Australia
- Central Clinical School, University of Sydney, Parramatta Road, Camperdown, NSW 2050, Australia
| | - Arina D. Omer
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Baylor Plaza, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Main St, Houston, TX 77005, USA
| | - Zane Colaric
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Baylor Plaza, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Main St, Houston, TX 77005, USA
| | - Jens Keilwagen
- Julius Kühn-Institut, Erwin-Baur-Str. 27, 06484 Quedlinburg, Germany
| | - Ksenia Skvortsova
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
| | - Ozren Bogdanovic
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| | - Martin A Smith
- Garvan Institute of Medical Research, Victoria Street, Darlinghurst, NSW 2010, Australia
- Faculty of Medicine, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Baylor Plaza, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Main St, Houston, TX 77005, USA
- Center for Theoretical and Biological Physics, Rice University, Main St, Houston, TX 77005, USA
- Broad Institute of MIT and Harvard, Main St, Cambridge, MA 02142, USA
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, ShanghaiTech University, Huaxia Middle Rd, Pudong 201210, China
| | - Timothy P L Smith
- US Meat Animal Research Center, Agricultural Research Service USDA, Rd 313, Clay Center, NE 68933, USA
| | - Robert A Zammit
- Vineyard Veterinary Hospital, Windsor Rd, Vineyard, NSW 2765, Australia
| | - J William O Ballard
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, High St, Kensington, NSW 2052, Australia
| |
Collapse
|
92
|
Karaoğlanoğlu F, Ricketts C, Ebren E, Rasekh ME, Hajirasouliha I, Alkan C. VALOR2: characterization of large-scale structural variants using linked-reads. Genome Biol 2020; 21:72. [PMID: 32192518 PMCID: PMC7083023 DOI: 10.1186/s13059-020-01975-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 02/24/2020] [Indexed: 12/31/2022] Open
Abstract
Most existing methods for structural variant detection focus on discovery and genotyping of deletions, insertions, and mobile elements. Detection of balanced structural variants with no gain or loss of genomic segments, for example, inversions and translocations, is a particularly challenging task. Furthermore, there are very few algorithms to predict the insertion locus of large interspersed segmental duplications and characterize translocations. Here, we propose novel algorithms to characterize large interspersed segmental duplications, inversions, deletions, and translocations using linked-read sequencing data. We redesign our earlier algorithm, VALOR, and implement our new algorithms in a new software package, called VALOR2.
Collapse
Affiliation(s)
- Fatih Karaoğlanoğlu
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
| | - Camir Ricketts
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, 1300 York Ave, New York, 10065 NY USA
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
| | - Ezgi Ebren
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
| | - Marzieh Eslami Rasekh
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, 02215 MA USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Bilkent University, Ankara, 06800 Turkey
| |
Collapse
|
93
|
Kyriakidou M, Anglin NL, Ellis D, Tai HH, Strömvik MV. Genome assembly of six polyploid potato genomes. Sci Data 2020; 7:88. [PMID: 32161269 PMCID: PMC7066127 DOI: 10.1038/s41597-020-0428-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 02/14/2020] [Indexed: 02/06/2023] Open
Abstract
Genome assembly of polyploid plant genomes is a laborious task as they contain more than two copies of the genome, are often highly heterozygous with a high level of repetitive DNA. Next Generation genome sequencing data representing one Chilean and five Peruvian polyploid potato (Solanum spp.) landrace genomes was used to construct genome assemblies comprising five taxa. Third Generation sequencing data (Linked and Long-read data) was used to improve the assembly for one of the genomes. Native landraces are valuable genetic resources for traits such as disease and pest resistance, environmental tolerance and other qualities of interest such as nutrition and fiber for breeding programs. The need for conservation and enhanced understanding of genetic diversity of cultivated potato from South America is also crucial to North American and European cultivars. Here, we report draft genomes from six polyploid potato landraces representing five taxa, illustrating how Third Generation Sequencing can aid in assembling polyploid genomes.
Collapse
Affiliation(s)
- Maria Kyriakidou
- Department of Plant Science, McGill University, 21111 Lakeshore Rd., Sainte-Anne-de-Bellevue, QC, H9X3V9, Canada
| | - Noelle L Anglin
- CIP-International Potato Center, Avenida La Molina 1895, Lima, 12, Peru
| | - David Ellis
- CIP-International Potato Center, Avenida La Molina 1895, Lima, 12, Peru
| | - Helen H Tai
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, PO Box 20280, 850 Lincoln Rd., Fredericton, NB, E3B 4Z7, Canada
| | - Martina V Strömvik
- Department of Plant Science, McGill University, 21111 Lakeshore Rd., Sainte-Anne-de-Bellevue, QC, H9X3V9, Canada.
| |
Collapse
|
94
|
Genome Sequence of the Euryhaline Javafish Medaka, Oryzias javanicus: A Small Aquarium Fish Model for Studies on Adaptation to Salinity. G3-GENES GENOMES GENETICS 2020; 10:907-915. [PMID: 31988161 PMCID: PMC7056978 DOI: 10.1534/g3.119.400725] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The genus Oryzias consists of 35 medaka-fish species each exhibiting various ecological, morphological and physiological peculiarities and adaptations. Beyond of being a comprehensive phylogenetic group for studying intra-genus evolution of several traits like sex determination, behavior, morphology or adaptation through comparative genomic approaches, all medaka species share many advantages of experimental model organisms including small size and short generation time, transparent embryos and genome editing tools for reverse and forward genetic studies. The Java medaka, Oryzias javanicus, is one of the two species of medaka perfectly adapted for living in brackish/sea-waters. Being an important component of the mangrove ecosystem, O. javanicus is also used as a valuable marine test-fish for ecotoxicology studies. Here, we sequenced and assembled the whole genome of O. javanicus, and anticipate this resource will be catalytic for a wide range of comparative genomic, phylogenetic and functional studies. Complementary sequencing approaches including long-read technology and data integration with a genetic map allowed the final assembly of 908 Mbp of the O. javanicus genome. Further analyses estimate that the O. javanicus genome contains 33% of repeat sequences and has a heterozygosity of 0.96%. The achieved draft assembly contains 525 scaffolds with a total length of 809.7 Mbp, a N50 of 6,3 Mbp and a L50 of 37 scaffolds. We identified 21454 predicted transcripts for a total transcriptome size of 57, 146, 583 bps. We provide here a high-quality chromosome scale draft genome assembly of the euryhaline Javafish medaka (321 scaffolds anchored on 24 chromosomes (representing 97.7% of the total bases)), and give emphasis on the evolutionary adaptation to salinity.
Collapse
|
95
|
Í Kongsstovu S, Mikalsen SO, Homrum EÍ, Jacobsen JA, Flicek P, Dahl HA. Using long and linked reads to improve an Atlantic herring (Clupea harengus) genome assembly. Sci Rep 2019; 9:17716. [PMID: 31776409 PMCID: PMC6881392 DOI: 10.1038/s41598-019-54151-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 11/08/2019] [Indexed: 01/01/2023] Open
Abstract
Atlantic herring (Clupea harengus) is one of the most abundant fish species in the world. It is an important economical and nutritional resource, as well as a crucial part of the North Atlantic ecosystem. In 2016, a draft herring genome assembly was published. Being a species of such importance, we sought to independently verify and potentially improve the herring genome assembly. We sequenced the herring genome generating paired-end, mate-pair, linked and long reads. Three assembly versions of the herring genome were generated based on a de novo assembly (A1), which was scaffolded using linked and long reads (A2) and then merged with the previously published assembly (A3). The resulting assemblies were compared using parameters describing the size, fragmentation, correctness, and completeness of the assemblies. Results showed that the A2 assembly was less fragmented, more complete and more correct than A1. A3 showed improvement in fragmentation and correctness compared with A2 and the published assembly but was slightly less complete than the published assembly. Thus, we here confirmed the previously published herring assembly, and made improvements by further scaffolding the assembly and removing low-quality sequences using linked and long reads and merging of assemblies.
Collapse
Affiliation(s)
- Sunnvør Í Kongsstovu
- Amplexa Genetics A/S, Hoyvíksvegur 51, FO-100, Tórshavn, Faroe Islands. .,University of the Faroe Islands, Department of Science and Technology, Vestara Bryggja 15, FO-100, Tórshavn, Faroe Islands. .,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Svein-Ole Mikalsen
- University of the Faroe Islands, Department of Science and Technology, Vestara Bryggja 15, FO-100, Tórshavn, Faroe Islands
| | - Eydna Í Homrum
- Faroe Marine Research Institute, Nóatún 1, FO-100, Tórshavn, Faroe Islands
| | - Jan Arge Jacobsen
- Faroe Marine Research Institute, Nóatún 1, FO-100, Tórshavn, Faroe Islands
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Hans Atli Dahl
- Amplexa Genetics A/S, Hoyvíksvegur 51, FO-100, Tórshavn, Faroe Islands
| |
Collapse
|
96
|
Luo J, Lyu M, Chen R, Zhang X, Luo H, Yan C. SLR: a scaffolding algorithm based on long reads and contig classification. BMC Bioinformatics 2019; 20:539. [PMID: 31666010 PMCID: PMC6820941 DOI: 10.1186/s12859-019-3114-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 09/23/2019] [Indexed: 11/10/2022] Open
Abstract
Background Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the past few years, long reads sequenced by third-generation sequencing technologies (Pacific Biosciences and Oxford Nanopore) have been demonstrated to be useful for sequencing repetitive regions in genomes. Although some stand-alone scaffolding algorithms based on long reads have been presented, scaffolding still requires a new strategy to take full advantage of the characteristics of long reads. Results Here, we present a new scaffolding algorithm based on long reads and contig classification (SLR). Through the alignment information of long reads and contigs, SLR classifies the contigs into unique contigs and ambiguous contigs for addressing the problem of repetitive regions. Next, SLR uses only unique contigs to produce draft scaffolds. Then, SLR inserts the ambiguous contigs into the draft scaffolds and produces the final scaffolds. We compare SLR to three popular scaffolding tools by using long read datasets sequenced with Pacific Biosciences and Oxford Nanopore technologies. The experimental results show that SLR can produce better results in terms of accuracy and completeness. The open-source code of SLR is available at https://github.com/luojunwei/SLR. Conclusion In this paper, we describes SLR, which is designed to scaffold contigs using long reads. We conclude that SLR can improve the completeness of genome assembly.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China.
| | - Mengna Lyu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Ranran Chen
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Xiaohong Zhang
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| |
Collapse
|
97
|
Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Phillippy AM, Koren S. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol 2019; 15:e1007273. [PMID: 31433799 PMCID: PMC6719893 DOI: 10.1371/journal.pcbi.1007273] [Citation(s) in RCA: 484] [Impact Index Per Article: 80.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 09/03/2019] [Accepted: 07/18/2019] [Indexed: 12/16/2022] Open
Abstract
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
Collapse
Affiliation(s)
- Jay Ghurye
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
| | - Anthony Schmitt
- Arima Genomics, San Diego, California, United States of America
| | | | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
98
|
Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 2019; 19:329-346. [PMID: 29599501 DOI: 10.1038/s41576-018-0003-4] [Citation(s) in RCA: 320] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hayan Lee
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Charlotte A Darby
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
99
|
Kwan HH, Culibrk L, Taylor GA, Leelakumari S, Tan R, Jackman SD, Tse K, MacLeod T, Cheng D, Chuah E, Kirk H, Pandoh P, Carlsen R, Zhao Y, Mungall AJ, Moore R, Birol I, Marra MA, Rosen DAS, Haulena M, Jones SJM. The Genome of the Steller Sea Lion ( Eumetopias jubatus). Genes (Basel) 2019; 10:genes10070486. [PMID: 31248052 PMCID: PMC6678222 DOI: 10.3390/genes10070486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/20/2019] [Accepted: 06/21/2019] [Indexed: 11/16/2022] Open
Abstract
The Steller sea lion is the largest member of the Otariidae family and is found in the coastal waters of the northern Pacific Rim. Here, we present the Steller sea lion genome, determined through DNA sequencing approaches that utilized microfluidic partitioning library construction, as well as nanopore technologies. These methods constructed a highly contiguous assembly with a scaffold N50 length of over 14 megabases, a contig N50 length of over 242 kilobases and a total length of 2.404 gigabases. As a measure of completeness, 95.1% of 4104 highly conserved mammalian genes were found to be complete within the assembly. Further annotation identified 19,668 protein coding genes. The assembled genome sequence and underlying sequence data can be found at the National Center for Biotechnology Information (NCBI) under the BioProject accession number PRJNA475770.
Collapse
Affiliation(s)
- Harwood H Kwan
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T-1Z4, Canada
| | - Luka Culibrk
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
- Department of Graduate Studies, Bioinformatics, University of British Columbia, Vancouver, BC V6T-1Z4, Canada
| | - Gregory A Taylor
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Sreeja Leelakumari
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Ryan Tan
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Shaun D Jackman
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Kane Tse
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Tina MacLeod
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Dean Cheng
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Eric Chuah
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Heather Kirk
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Pawan Pandoh
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Rebecca Carlsen
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Yongjun Zhao
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Andrew J Mungall
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Richard Moore
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T-1Z4, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T-1Z4, Canada
| | - David A S Rosen
- Institute for the Oceans and Fisheries, University of British Columbia, Vancouver, BC V6T-1Z4, Canada
- Vancouver Aquarium, Vancouver, BC V6G 3E2, Canada
| | | | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer, Vancouver, BC V5Z-4S6, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T-1Z4, Canada.
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A-1S6, Canada.
| |
Collapse
|
100
|
De Novo Sequencing, Assembly, and Annotation of Four Threespine Stickleback Genomes Based on Microfluidic Partitioned DNA Libraries. Genes (Basel) 2019; 10:genes10060426. [PMID: 31163709 PMCID: PMC6627416 DOI: 10.3390/genes10060426] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 05/24/2019] [Accepted: 05/27/2019] [Indexed: 11/16/2022] Open
Abstract
: The threespine stickleback is a geographically widespread and ecologically highly diverse fish that has emerged as a powerful model system for evolutionary genomics and developmental biology. Investigations in this species currently rely on a single high-quality reference genome, but would benefit from the availability of additional, independently sequenced and assembled genomes. We present here the assembly of four new stickleback genomes, based on the sequencing of microfluidic partitioned DNA libraries. The base pair lengths of the four genomes reach 92-101% of the standard reference genome length. Together with their de novo gene annotation, these assemblies offer a resource enhancing genomic investigations in stickleback. The genomes and their annotations are available from the Dryad Digital Repository (https://doi.org/10.5061/dryad.113j3h7).
Collapse
|