201
|
Whibley A, Kelley JL, Narum SR. The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Mol Ecol Resour 2021; 21:641-652. [PMID: 33326691 DOI: 10.1111/1755-0998.13312] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 12/08/2020] [Accepted: 12/11/2020] [Indexed: 12/20/2022]
Abstract
The quality of genome assemblies has improved rapidly in recent years due to continual advances in sequencing technology, assembly approaches, and quality control. In the field of molecular ecology, this has led to the development of exceptional quality genome assemblies that will be important long-term resources for broader studies into ecological, conservation, evolutionary, and population genomics of naturally occurring species. Moreover, the extent to which a single reference genome represents the diversity within a species varies: pan-genomes will become increasingly important ecological genomics resources, particularly in systems found to have considerable presence-absence variation in their functional content. Here, we highlight advances in technology that have raised the bar for genome assembly and provide guidance on standards to achieve exceptional quality reference genomes. Key recommendations include the following: (a) Genome assemblies should include long-read sequencing except in rare cases where it is effectively impossible to acquire adequately preserved samples needed for high molecular weight DNA standards. (b) At least one scaffolding approach should be included with genome assembly such as Hi-C or optical mapping. (c) Genome assemblies should be carefully evaluated, this may involve utilising short read data for genome polishing, error correction, k-mer analyses, and estimating the percent of reads that map back to an assembly. Finally, a genome assembly is most valuable if all data and methods are made publicly available and the utility of a genome for further studies is verified through examples. While these recommendations are based on current technology, we anticipate that future advances will push the field further and the molecular ecology community should continue to adopt new approaches that attain the highest quality genome assemblies.
Collapse
Affiliation(s)
| | | | - Shawn R Narum
- University of Idaho, Moscow, ID, USA.,Columbia River Inter-Tribal Fish Commission, Hagerman, ID, USA
| |
Collapse
|
202
|
Eschenbrenner CJ, Feurtey A, Stukenbrock EH. Population Genomics of Fungal Plant Pathogens and the Analyses of Rapidly Evolving Genome Compartments. Methods Mol Biol 2021; 2090:337-355. [PMID: 31975174 DOI: 10.1007/978-1-0716-0199-0_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Genome sequencing of fungal pathogens have documented extensive variation in genome structure and composition between species and in many cases between individuals of the same species. This type of genomic variation can be adaptive for pathogens to rapidly evolve new virulence phenotypes. Analyses of genome-wide variation in fungal pathogen genomes rely on high quality assemblies and methods to detect and quantify structural variation. Population genomic studies in fungi have addressed the underlying mechanisms whereby structural variation can be rapidly generated. Transposable elements, high mutation and recombination rates as well as incorrect chromosome segregation during mitosis and meiosis contribute to extensive variation observed in many species. We here summarize key findings in the field of fungal pathogen genomics and we discuss methods to detect and characterize structural variants including an alignment-based pipeline to study variation in population genomic data.
Collapse
Affiliation(s)
- Christoph J Eschenbrenner
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Alice Feurtey
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Eva H Stukenbrock
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany.
- Max Planck Institute for Evolutionary Biology, Plön, Germany.
| |
Collapse
|
203
|
Du H, Diao C, Zhao P, Zhou L, Liu JF. Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads. Brief Bioinform 2021; 22:6082823. [PMID: 33429431 DOI: 10.1093/bib/bbaa399] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/20/2020] [Accepted: 12/08/2020] [Indexed: 11/12/2022] Open
Abstract
With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.
Collapse
Affiliation(s)
- Heng Du
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chenguang Diao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Pengju Zhao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
204
|
Morisse P, Marchet C, Limasset A, Lecroq T, Lefebvre A. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Sci Rep 2021; 11:761. [PMID: 33436980 PMCID: PMC7804095 DOI: 10.1038/s41598-020-80757-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/22/2020] [Indexed: 11/09/2022] Open
Abstract
Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .
Collapse
|
205
|
Holley G, Beyter D, Ingimundardottir H, Møller PL, Kristmundsdottir S, Eggertsson HP, Halldorsson BV. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly. Genome Biol 2021; 22:28. [PMID: 33419473 PMCID: PMC7792008 DOI: 10.1186/s13059-020-02244-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 12/15/2020] [Indexed: 12/20/2022] Open
Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Collapse
Affiliation(s)
| | | | | | - Peter L Møller
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Snædis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| |
Collapse
|
206
|
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, Liachko I, Haryoko T, Jønsson KA, Zhou Q, Irestedt M, Suh A. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour 2021; 21:263-286. [PMID: 32937018 PMCID: PMC7757076 DOI: 10.1111/1755-0998.13252] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/21/2020] [Accepted: 08/26/2020] [Indexed: 01/09/2023]
Abstract
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
- Museum für NaturkundeLeibniz Institut für Evolutions‐ und BiodiversitätsforschungBerlinGermany
| | - Luohao Xu
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
| | - Reto Burri
- Department of Population EcologyInstitute of Ecology and EvolutionFriedrich‐Schiller‐University JenaJenaGermany
| | | | - Ignas Bunikis
- Department of Immunology, Genetics and PathologyScience for Life LaboratoryUppsala Genome CenterUppsala UniversityUppsalaSweden
| | | | - Tri Haryoko
- Research Centre for BiologyMuseum Zoologicum BogorienseIndonesian Institute of Sciences (LIPI)CibinongIndonesia
| | - Knud A. Jønsson
- Natural History Museum of DenmarkUniversity of CopenhagenCopenhagenDenmark
| | - Qi Zhou
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
- MOE Laboratory of Biosystems Homeostasis & ProtectionLife Sciences InstituteZhejiang UniversityHangzhouChina
- Center for Reproductive MedicineThe 2nd Affiliated HospitalSchool of MedicineZhejiang UniversityHangzhouChina
| | - Martin Irestedt
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
| | - Alexander Suh
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- School of Biological Sciences—Organisms and the EnvironmentUniversity of East AngliaNorwichUK
| |
Collapse
|
207
|
Zhang H, Jain C, Aluru S. A comprehensive evaluation of long read error correction methods. BMC Genomics 2020; 21:889. [PMID: 33349243 PMCID: PMC7751105 DOI: 10.1186/s12864-020-07227-0] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/12/2020] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. RESULTS In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research. CONCLUSIONS Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .
Collapse
Affiliation(s)
- Haowen Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30332, GA, USA. .,Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, 30332, GA, USA.
| |
Collapse
|
208
|
Heller D, Vingron M. SVIM-asm: Structural variant detection from haploid and diploid genome assemblies. Bioinformatics 2020; 36:5519-5521. [PMID: 33346817 PMCID: PMC8016491 DOI: 10.1093/bioinformatics/btaa1034] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/16/2020] [Accepted: 12/12/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION With the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes. RESULTS We introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual. AVAILABILITY AND IMPLEMENTATION SVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/svim-asm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Heller
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Martin Vingron
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
209
|
Bennett EP, Petersen BL, Johansen IE, Niu Y, Yang Z, Chamberlain CA, Met Ö, Wandall HH, Frödin M. INDEL detection, the 'Achilles heel' of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels. Nucleic Acids Res 2020; 48:11958-11981. [PMID: 33170255 PMCID: PMC7708060 DOI: 10.1093/nar/gkaa975] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 10/05/2020] [Accepted: 10/15/2020] [Indexed: 12/11/2022] Open
Abstract
Advances in genome editing technologies have enabled manipulation of genomes at the single base level. These technologies are based on programmable nucleases (PNs) that include meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated 9 (Cas9) nucleases and have given researchers the ability to delete, insert or replace genomic DNA in cells, tissues and whole organisms. The great flexibility in re-designing the genomic target specificity of PNs has vastly expanded the scope of gene editing applications in life science, and shows great promise for development of the next generation gene therapies. PN technologies share the principle of inducing a DNA double-strand break (DSB) at a user-specified site in the genome, followed by cellular repair of the induced DSB. PN-elicited DSBs are mainly repaired by the non-homologous end joining (NHEJ) and the microhomology-mediated end joining (MMEJ) pathways, which can elicit a variety of small insertion or deletion (indel) mutations. If indels are elicited in a protein coding sequence and shift the reading frame, targeted gene knock out (KO) can readily be achieved using either of the available PNs. Despite the ease by which gene inactivation in principle can be achieved, in practice, successful KO is not only determined by the efficiency of NHEJ and MMEJ repair; it also depends on the design and properties of the PN utilized, delivery format chosen, the preferred indel repair outcomes at the targeted site, the chromatin state of the target site and the relative activities of the repair pathways in the edited cells. These variables preclude accurate prediction of the nature and frequency of PN induced indels. A key step of any gene KO experiment therefore becomes the detection, characterization and quantification of the indel(s) induced at the targeted genomic site in cells, tissues or whole organisms. In this survey, we briefly review naturally occurring indels and their detection. Next, we review the methods that have been developed for detection of PN-induced indels. We briefly outline the experimental steps and describe the pros and cons of the various methods to help users decide a suitable method for their editing application. We highlight recent advances that enable accurate and sensitive quantification of indel events in cells regardless of their genome complexity, turning a complex pool of different indel events into informative indel profiles. Finally, we review what has been learned about PN-elicited indel formation through the use of the new methods and how this insight is helping to further advance the genome editing field.
Collapse
Affiliation(s)
- Eric Paul Bennett
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Bent Larsen Petersen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| | - Ida Elisabeth Johansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| | - Yiyuan Niu
- Biotech Research and Innovation Centre (BRIC), Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi, China
| | - Zhang Yang
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | | | - Özcan Met
- Center for Cancer Immune Therapy, Department of Oncology, Copenhagen University Hospital, Herlev, Denmark
- Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Hans H Wandall
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Morten Frödin
- Biotech Research and Innovation Centre (BRIC), Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
210
|
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes (Basel) 2020; 11:E1444. [PMID: 33266238 PMCID: PMC7760597 DOI: 10.3390/genes11121444] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/24/2020] [Accepted: 11/26/2020] [Indexed: 01/23/2023] Open
Abstract
Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.
Collapse
Affiliation(s)
- Nazeefa Fatima
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Anna Petri
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Ulf Gyllensten
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
211
|
Koebley SR, Mikheikin A, Leslie K, Guest D, McConnell-Wells W, Lehman JH, Al Juhaishi T, Zhang X, Roberts CH, Picco L, Toor A, Chesney A, Reed J. Digital Polymerase Chain Reaction Paired with High-Speed Atomic Force Microscopy for Quantitation and Length Analysis of DNA Length Polymorphisms. ACS NANO 2020; 14:15385-15393. [PMID: 33169971 DOI: 10.1021/acsnano.0c05897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
DNA length polymorphisms are found in many serious diseases, and assessment of their length and abundance is often critical for accurate diagnosis. However, measuring their length and frequency in a mostly wild-type background, as occurs in many situations, remains challenging due to their variable and repetitive nature. To overcome these hurdles, we combined two powerful techniques, digital polymerase chain reaction (dPCR) and high-speed atomic force microscopy (HSAFM), to create a simple, rapid, and flexible method for quantifying both the size and proportion of DNA length polymorphisms. In our approach, individual amplicons from each dPCR partition are imaged and sized directly. We focused on internal tandem duplications (ITDs) located within the FLT3 gene, which are associated with acute myeloid leukemia and often indicative of a poor prognosis. In an analysis of over 1.5 million HSAFM-imaged amplicons from cell line and clinical samples containing FLT3-ITDs, dPCR-HSAFM returned the expected variant length and variant allele frequency, down to 5% variant samples. As a high-throughput method with single-molecule resolution, dPCR-HSAFM thus represents an advance in HSAFM analysis and a powerful tool for the diagnosis of length polymorphisms.
Collapse
Affiliation(s)
- Sean R Koebley
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Andrey Mikheikin
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Kevin Leslie
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Daniel Guest
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Wendy McConnell-Wells
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Joshua H Lehman
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Taha Al Juhaishi
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Xiaojie Zhang
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Catherine H Roberts
- Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Loren Picco
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Amir Toor
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Alden Chesney
- Department of Pathology, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| | - Jason Reed
- Physics Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
- Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia 23298, United States
| |
Collapse
|
212
|
Short and long-read ultra-deep sequencing profiles emerging heterogeneity across five platform Escherichia coli strains. Metab Eng 2020; 65:197-206. [PMID: 33242648 DOI: 10.1016/j.ymben.2020.11.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 10/26/2020] [Accepted: 11/12/2020] [Indexed: 11/24/2022]
Abstract
Reprogramming organisms for large-scale bioproduction counters their evolutionary objectives of fast growth and often leads to mutational collapse of the engineered production pathways during cultivation. Yet, the mutational susceptibility of academic and industrial Escherichia coli bioproduction host strains are poorly understood. In this study, we apply 2nd and 3rd generation deep sequencing to profile simultaneous modes of genetic heterogeneity that decimate engineered biosynthetic production in five popular E. coli hosts BL21(DE3), TOP10, MG1655, W, and W3110 producing 2,3-butanediol and mevalonic acid. Combining short-read and long-read sequencing, we detect strain and sequence-specific mutational modes including single nucleotide polymorphism, inversion, and mobile element transposition, as well as complex structural variations that disrupt the integrity of the engineered biosynthetic pathway. Our analysis suggests that organism engineers should avoid chassis strains hosting active insertion sequence (IS) subfamilies such as IS1 and IS10 present in popular E. coli TOP10. We also recommend monitoring for increased mutagenicity in the pathway transcription initiation regions and recombinogenic repeats. Together, short and long sequencing reads identified latent low-frequency mutation events such as a short detrimental inversion within a pathway gene, driven by 8-bp short inverted repeats. This demonstrates the power of combining ultra-deep DNA sequencing technologies to profile genetic heterogeneities of engineered constructs and explore the markedly different mutational landscapes of common E. coli host strains. The observed multitude of evolving variants underlines the usefulness of early mutational profiling for new synthetic pathways designed to sustain in organisms over long cultivation scales.
Collapse
|
213
|
Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J, Springer MS. Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 2020; 9:29-53. [PMID: 33228377 DOI: 10.1146/annurev-animal-061220-023149] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of placental mammals are being sequenced at an unprecedented rate. Alignments of hundreds, and one day thousands, of genomes spanning the rich living and extinct diversity of species offer unparalleled power to resolve phylogenetic controversies, identify genomic innovations of adaptation, and dissect the genetic architecture of reproductive isolation. We highlight outstanding questions about the earliest phases of placental mammal diversification and the promise of newer methods, as well as remaining challenges, toward using whole genome data to resolve placental mammal phylogeny. The next phase of mammalian comparative genomics will see the completion and application of finished-quality, gapless genome assemblies from many ordinal lineages and closely related species. Interspecific comparisons between the most hypervariable genomic loci will likely reveal large, but heretofore mostly underappreciated, effects on population divergence, morphological innovation, and the origin of new species.
Collapse
Affiliation(s)
- William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, California 92521, USA
| |
Collapse
|
214
|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020; 16:e1008397. [PMID: 33226985 PMCID: PMC7721175 DOI: 10.1371/journal.pcbi.1008397] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 12/07/2020] [Accepted: 09/24/2020] [Indexed: 11/19/2022] Open
Abstract
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Cancer and many other diseases are often driven by structural rearrangements in the patients. Their precise identification is necessary to understand evolution and cure for the disease. In this study, we have compared two sequencing technologies for the identification of structural variations i.e. Illumina’s short-reads and 10X Genomics linked-reads sequencing. Short-reads sequencing is already known to have high false discovery rate for structural variations, while, an unbiased performance evaluation of linked-reads sequencing is missing. Hence, we evaluate the performance of these two technologies using computational and PCR based methodologies. Moreover, we also present a statistical approach to increase their performance, supporting better detection of structural variations and thus further research into disease biology.
Collapse
|
215
|
Lee N, Park MJ, Song W, Jeon K, Jeong S. Currently Applied Molecular Assays for Identifying ESR1 Mutations in Patients with Advanced Breast Cancer. Int J Mol Sci 2020; 21:ijms21228807. [PMID: 33233830 PMCID: PMC7699999 DOI: 10.3390/ijms21228807] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 11/17/2020] [Accepted: 11/19/2020] [Indexed: 12/11/2022] Open
Abstract
Approximately 70% of breast cancers, the leading cause of cancer-related mortality worldwide, are positive for the estrogen receptor (ER). Treatment of patients with luminal subtypes is mainly based on endocrine therapy. However, ER positivity is reduced and ESR1 mutations play an important role in resistance to endocrine therapy, leading to advanced breast cancer. Various methodologies for the detection of ESR1 mutations have been developed, and the most commonly used method is next-generation sequencing (NGS)-based assays (50.0%) followed by droplet digital PCR (ddPCR) (45.5%). Regarding the sample type, tissue (50.0%) was more frequently used than plasma (27.3%). However, plasma (46.2%) became the most used method in 2016-2019, in contrast to 2012-2015 (22.2%). In 2016-2019, ddPCR (61.5%), rather than NGS (30.8%), became a more popular method than it was in 2012-2015. The easy accessibility, non-invasiveness, and demonstrated usefulness with high sensitivity of ddPCR using plasma have changed the trends. When using these assays, there should be a comprehensive understanding of the principles, advantages, vulnerability, and precautions for interpretation. In the future, advanced NGS platforms and modified ddPCR will benefit patients by facilitating treatment decisions efficiently based on information regarding ESR1 mutations.
Collapse
Affiliation(s)
- Nuri Lee
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Min-Jeong Park
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Wonkeun Song
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Kibum Jeon
- Department of Laboratory Medicine, Hangang Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea;
| | - Seri Jeong
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
- Correspondence: ; Tel.: +82-845-5305
| |
Collapse
|
216
|
Rubin MA, Bristow RG, Thienger PD, Dive C, Imielinski M. Impact of Lineage Plasticity to and from a Neuroendocrine Phenotype on Progression and Response in Prostate and Lung Cancers. Mol Cell 2020; 80:562-577. [PMID: 33217316 PMCID: PMC8399907 DOI: 10.1016/j.molcel.2020.10.033] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 09/06/2020] [Accepted: 10/22/2020] [Indexed: 02/07/2023]
Abstract
Intratumoral heterogeneity can occur via phenotype transitions, often after chronic exposure to targeted anticancer agents. This process, termed lineage plasticity, is associated with acquired independence to an initial oncogenic driver, resulting in treatment failure. In non-small cell lung cancer (NSCLC) and prostate cancers, lineage plasticity manifests when the adenocarcinoma phenotype transforms into neuroendocrine (NE) disease. The exact molecular mechanisms involved in this NE transdifferentiation remain elusive. In small cell lung cancer (SCLC), plasticity from NE to nonNE phenotypes is driven by NOTCH signaling. Herein we review current understanding of NE lineage plasticity dynamics, exemplified by prostate cancer, NSCLC, and SCLC.
Collapse
Affiliation(s)
- Mark A Rubin
- Department for BioMedical Research, University of Bern and Inselspital, 3010 Bern, Switzerland; Bern Center for Precision Medicine, University of Bern and Inselspital, 3010 Bern, Switzerland.
| | - Robert G Bristow
- Manchester Cancer Research Centre and Cancer Research UK Manchester Institute, University of Manchester, Macclesfield SK10 4TG, UK
| | - Phillip D Thienger
- Department for BioMedical Research, University of Bern and Inselspital, 3010 Bern, Switzerland
| | - Caroline Dive
- Cancer Research UK Manchester Institute Cancer Biomarker Centre, University of Manchester, Macclesfield SK10 4TG, UK
| | - Marcin Imielinski
- Pathology and Laboratory Medicine and Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
217
|
Benaud N, Edwards RJ, Amos TG, D'Agostino PM, Gutiérrez-Chávez C, Montgomery K, Nicetic I, Ferrari BC. Antarctic desert soil bacteria exhibit high novel natural product potential, evaluated through long-read genome sequencing and comparative genomics. Environ Microbiol 2020; 23:3646-3664. [PMID: 33140504 DOI: 10.1111/1462-2920.15300] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 10/29/2020] [Indexed: 11/30/2022]
Abstract
Actinobacteria and Proteobacteria are important producers of bioactive natural products (NP), and these phyla dominate in the arid soils of Antarctica, where metabolic adaptations influence survival under harsh conditions. Biosynthetic gene clusters (BGCs) which encode NPs, are typically long and repetitious high G + C regions difficult to sequence with short-read technologies. We sequenced 17 Antarctic soil bacteria from multi-genome libraries, employing the long-read PacBio platform, to optimize capture of BGCs and to facilitate a comprehensive analysis of their NP capacity. We report 13 complete bacterial genomes of high quality and contiguity, representing 10 different cold-adapted genera including novel species. Antarctic BGCs exhibited low similarity to known compound BGCs (av. 31%), with an abundance of terpene, non-ribosomal peptide and polyketide-encoding clusters. Comparative genome analysis was used to map BGC variation between closely related strains from geographically distant environments. Results showed the greatest biosynthetic differences to be in a psychrotolerant Streptomyces strain, as well as a rare Actinobacteria genus, Kribbella, while two other Streptomyces spp. were surprisingly similar to known genomes. Streptomyces and Kribbella BGCs were predicted to encode antitumour, antifungal, antibacterial and biosurfactant-like compounds, and the synthesis of NPs with antibacterial, antifungal and surfactant properties was confirmed through bioactivity assays.
Collapse
Affiliation(s)
- Nicole Benaud
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Timothy G Amos
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Paul M D'Agostino
- Technische Universität Dresden, Chair of Technical Biochemistry, Bergstraße 66, 01602 Dresden, Germany
| | | | - Kate Montgomery
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Iskra Nicetic
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| | - Belinda C Ferrari
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, 2052, Australia
| |
Collapse
|
218
|
Kadota M, Nishimura O, Miura H, Tanaka K, Hiratani I, Kuraku S. Multifaceted Hi-C benchmarking: what makes a difference in chromosome-scale genome scaffolding? Gigascience 2020; 9:5695848. [PMID: 31919520 PMCID: PMC6952475 DOI: 10.1093/gigascience/giz158] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 10/23/2019] [Accepted: 12/02/2019] [Indexed: 12/28/2022] Open
Abstract
Background Hi-C is derived from chromosome conformation capture (3C) and targets chromatin contacts on a genomic scale. This method has also been used frequently in scaffolding nucleotide sequences obtained by de novo genome sequencing and assembly, in which the number of resultant sequences rarely converges to the chromosome number. Despite its prevalent use, the sample preparation methods for Hi-C have not been intensively discussed, especially from the standpoint of genome scaffolding. Results To gain insight into the best practice of Hi-C scaffolding, we performed a multifaceted methodological comparison using vertebrate samples and optimized various factors during sample preparation, sequencing, and computation. As a result, we identified several key factors that helped improve Hi-C scaffolding, including the choice and preparation of tissues, library preparation conditions, the choice of restriction enzyme(s), and the choice of scaffolding program and its usage. Conclusions This study provides the first comparison of multiple sample preparation kits/protocols and computational programs for Hi-C scaffolding by an academic third party. We introduce a customized protocol designated “inexpensive and controllable Hi-C (iconHi-C) protocol,” which incorporates the optimal conditions identified in this study, and demonstrate this technique on chromosome-scale genome sequences of the Chinese softshell turtle Pelodiscus sinensis.
Collapse
Affiliation(s)
- Mitsutaka Kadota
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Osamu Nishimura
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Hisashi Miura
- Laboratory for Developmental Epigenetics, RIKEN BDR, Kobe 650-0047, Japan
| | - Kaori Tanaka
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Ichiro Hiratani
- Laboratory for Developmental Epigenetics, RIKEN BDR, Kobe 650-0047, Japan
| | - Shigehiro Kuraku
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| |
Collapse
|
219
|
Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet 2020; 21:597-614. [PMID: 32504078 PMCID: PMC7877196 DOI: 10.1038/s41576-020-0236-x] [Citation(s) in RCA: 582] [Impact Index Per Article: 116.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/31/2020] [Indexed: 12/27/2022]
Abstract
Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
220
|
Implications of germline copy-number variations in psychiatric disorders: review of large-scale genetic studies. J Hum Genet 2020; 66:25-37. [PMID: 32958875 DOI: 10.1038/s10038-020-00838-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 02/07/2023]
Abstract
Copy number variants (CNVs), defined as genome sequences of ≥50 bp that differ in copy number from that in a reference genome, are a common form of structural variation. Germline CNVs account for some of the missing heritability that single nucleotide polymorphisms could not account for. Recent technological advances have had a huge impact on CNV research. Microarray technology enables relatively low-cost, high-throughput, genome-wide measurements, and short-read sequencing technology enables the detection of short CNVs that cannot be detected by microarrays. As a result, large-scale genetic studies have been able to identify a variety of common and rare germline CNVs and their associations with diseases. Rare germline CNVs have been reported to be associated with neuropsychiatric disorders. In this review, we focused on germline CNVs and briefly described their functional characteristics, formation mechanisms, detection methods, related databases, and the latest findings. Finally, we introduced recent large-scale genetic studies to assess associations of CNVs with diseases, especially psychiatric disorders, and discussed the use of CNV-based animal models to investigate the molecular and cellular mechanisms underlying these disorders. The development and implementation of improved detection methods, such as long-read single-molecule sequencing, are expected to provide additional insight into the molecular basis of psychiatric disorders and other complex diseases, thus facilitating basic and clinical research on CNVs.
Collapse
|
221
|
Penouilh-Suzette C, Fourré S, Besnard G, Godiard L, Pecrix Y. A simple method for high molecular-weight genomic DNA extraction suitable for long-read sequencing from spores of an obligate biotroph oomycete. J Microbiol Methods 2020; 178:106054. [PMID: 32926900 DOI: 10.1016/j.mimet.2020.106054] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/09/2020] [Accepted: 09/07/2020] [Indexed: 10/23/2022]
Abstract
Long-read sequencing technologies are having a major impact on our approaches to studying non-model organisms and microbial communities. By significantly reducing the cost and facilitating the genome assembly pipelines, any laboratory can now develop its own genomics program regardless of the complexity of the genome studied. The most crucial current challenge is to develop efficient protocols for extracting genomic DNA (gDNA) with high quality and integrity adapted to the organism of interest. This can be particularly complex for obligate pathogens that must maintain intimate interactions inside infected host tissues. Here we propose a simple and cost-effective method for high molecular weight gDNA extraction from spores of Plasmopara halstedii, an obligate biotroph oomycete pathogen responsible for downy mildew in sunflower. We optimized the yield, the quality and the integrity of the extracted gDNA by fine-tuning three critical parameters, the grinding, the lysis temperature and the lysis duration. We obtained gDNA with a fragment size distribution reaching a peak ranging from 79 to 145 kb. More than half of the extracted gDNA consisted of DNA fragments larger than 42 kb, with 23% of fragments larger than 100 kb. We then demonstrated the relevance of this protocol for long-read sequencing using PacBio RSII technology. With this protocol, we were able to obtain a mean read length of 9.3 kb, a max read length of 71 kb and an N50 of 13.3 kb. The development of such DNA extraction protocols is an essential prerequisite for fully exploiting technologies requiring high molecular weight gDNA (e.g. long-read sequencing or optical mapping). These technological advances will help generate data to answer questions such as the role of newly duplicated gene clusters, repeated regions, genomic structural variations or to define number of chromosomes that still remains undefined in many species of pathogenic fungi and oomycetes.
Collapse
Affiliation(s)
- Charlotte Penouilh-Suzette
- LIPM (Laboratoire des Interactions Plantes Microorganismes), INRAE, CNRS, Université de Toulouse, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France.
| | - Sandra Fourré
- GeT-PlaGe, INRAE Auzeville, US 1426, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France.
| | - Guillaume Besnard
- CNRS, Université Paul Sabatier, IRD, UMR 5174 EDB (Laboratoire Évolution et Diversité Biologique), 118 route de Narbonne, F-31062 Toulouse, France.
| | - Laurence Godiard
- LIPM (Laboratoire des Interactions Plantes Microorganismes), INRAE, CNRS, Université de Toulouse, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France.
| | - Yann Pecrix
- LIPM (Laboratoire des Interactions Plantes Microorganismes), INRAE, CNRS, Université de Toulouse, 24 Chemin de Borde-Rouge, BP 52627, F-31326 Castanet-Tolosan, France; CIRAD, UMR 53 Peuplements Végétaux et Bioagresseurs en Milieu Tropical (PVBMT), Pole de Protection des Plantes, 7 chemin de l'IRAT, F-97410 Saint Pierre, Réunion, France.
| |
Collapse
|
222
|
Aganezov S, Goodwin S, Sherman RM, Sedlazeck FJ, Arun G, Bhatia S, Lee I, Kirsche M, Wappel R, Kramer M, Kostroff K, Spector DL, Timp W, McCombie WR, Schatz MC. Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing. Genome Res 2020; 30:1258-1273. [PMID: 32887686 PMCID: PMC7545150 DOI: 10.1101/gr.260497.119] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 08/07/2020] [Indexed: 12/14/2022]
Abstract
Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×–30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gayatri Arun
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sonam Bhatia
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Isac Lee
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - David L Spector
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Winston Timp
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | | | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.,Department of Biology, Johns Hopkins University, Baltimore, Maryland 21211, USA
| |
Collapse
|
223
|
López-Girona E, Davy MW, Albert NW, Hilario E, Smart MEM, Kirk C, Thomson SJ, Chagné D. CRISPR-Cas9 enrichment and long read sequencing for fine mapping in plants. PLANT METHODS 2020; 16:121. [PMID: 32884578 PMCID: PMC7465313 DOI: 10.1186/s13007-020-00661-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 08/18/2020] [Indexed: 05/03/2023]
Abstract
BACKGROUND Genomic methods for identifying causative variants for trait loci applicable to a wide range of germplasm are required for plant biologists and breeders to understand the genetic control of trait variation. RESULTS We implemented Cas9-targeted sequencing for fine-mapping in apple, a method combining CRISPR-Cas9 targeted cleavage of a region of interest, followed by enrichment and long-read sequencing using the Oxford Nanopore Technology (ONT). We demonstrated the capability of this methodology to specifically cleave and enrich a plant genomic locus spanning 8 kb. The repeated mini-satellite motif located upstream of the Malus × domestica (apple) MYB10 transcription factor gene, causing red fruit colouration when present in a heterozygous state, was our exemplar to demonstrate the efficiency of this method: it contains a genomic region with a long structural variant normally ignored by short-read sequencing technologiesCleavage specificity of the guide RNAs was demonstrated using polymerase chain reaction products, before using them to specify cleavage of high molecular weight apple DNA. An enriched library was subsequently prepared and sequenced using an ONT MinION flow cell (R.9.4.1). Of the 7,056 ONT reads base-called using both Albacore2 (v2.3.4) and Guppy (v3.2.4), with a median length of 9.78 and 9.89 kb, respectively, 85.35 and 91.38%, aligned to the reference apple genome. Of the aligned reads, 2.98 and 3.04% were on-target with read depths of 180 × and 196 × for Albacore2 and Guppy, respectively, and only five genomic loci were off-target with read depth greater than 25 × , which demonstrated the efficiency of the enrichment method and specificity of the CRISPR-Cas9 cleavage. CONCLUSIONS We demonstrated that this method can isolate and resolve single-nucleotide and structural variants at the haplotype level in plant genomic regions. The combination of CRISPR-Cas9 target enrichment and ONT sequencing provides a more efficient technology for fine-mapping loci than genome-walking approaches.
Collapse
Affiliation(s)
- Elena López-Girona
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | | | - Nick W. Albert
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | | | - Maia E. M. Smart
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | - Chris Kirk
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | | | - David Chagné
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| |
Collapse
|
224
|
Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences. G3-GENES GENOMES GENETICS 2020; 10:2801-2809. [PMID: 32532800 PMCID: PMC7407462 DOI: 10.1534/g3.120.401280] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.
Collapse
|
225
|
Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 2020; 21:189. [PMID: 32746918 PMCID: PMC7477834 DOI: 10.1186/s13059-020-02107-y] [Citation(s) in RCA: 203] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/14/2020] [Indexed: 01/01/2023] Open
Abstract
Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yongzhuang Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yue Jiang
- Nebula Genomics, Harbin, 150030, Heilongjiang, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Yan Gao
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Zhe Cui
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yadong Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
226
|
Yuan Y, Chung CYL, Chan TF. Advances in optical mapping for genomic research. Comput Struct Biotechnol J 2020; 18:2051-2062. [PMID: 32802277 PMCID: PMC7419273 DOI: 10.1016/j.csbj.2020.07.018] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 07/08/2020] [Accepted: 07/24/2020] [Indexed: 12/28/2022] Open
Abstract
Recent advances in optical mapping have allowed the construction of improved genome assemblies with greater contiguity. Optical mapping also enables genome comparison and identification of large-scale structural variations. Association of these large-scale genomic features with biological functions is an important goal in plant and animal breeding and in medical research. Optical mapping has also been used in microbiology and still plays an important role in strain typing and epidemiological studies. Here, we review the development of optical mapping in recent decades to illustrate its importance in genomic research. We detail its applications and algorithms to show its specific advantages. Finally, we discuss the challenges required to facilitate the optimization of optical mapping and improve its future development and application.
Collapse
Key Words
- 3D, three-dimensional
- DBG, de Bruijn graph
- DLS, direct label and strain
- DNA, deoxyribonucleic acid
- Genome assembly
- Hi-C, high-throughput chromosome conformation capture
- Mb, million base pair
- Next generation sequencing
- OLC, overlap-layout-consensus
- Optical mapping
- PCR, polymerase chain reaction
- PacBio, Pacific Biosciences
- SRS, short-read sequencing
- SV, structural variation
- Structural variation
- bp, base pair
- kb, kilobase pair
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
- AoE Centre for Genomic Studies on Plant-Environment Interaction for Sustainable Agriculture and Food Security, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Claire Yik-Lok Chung
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Ting-Fung Chan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
- AoE Centre for Genomic Studies on Plant-Environment Interaction for Sustainable Agriculture and Food Security, The Chinese University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
227
|
Perumal S, Koh CS, Jin L, Buchwaldt M, Higgins EE, Zheng C, Sankoff D, Robinson SJ, Kagale S, Navabi ZK, Tang L, Horner KN, He Z, Bancroft I, Chalhoub B, Sharpe AG, Parkin IAP. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome. NATURE PLANTS 2020; 6:929-941. [PMID: 32782408 PMCID: PMC7419231 DOI: 10.1038/s41477-020-0735-y] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 06/28/2020] [Indexed: 05/19/2023]
Abstract
It is only recently, with the advent of long-read sequencing technologies, that we are beginning to uncover previously uncharted regions of complex and inherently recursive plant genomes. To comprehensively study and exploit the genome of the neglected oilseed Brassica nigra, we generated two high-quality nanopore de novo genome assemblies. The N50 contig lengths for the two assemblies were 17.1 Mb (12 contigs), one of the best among 324 sequenced plant genomes, and 0.29 Mb (424 contigs), respectively, reflecting recent improvements in the technology. Comparison with a de novo short-read assembly corroborated genome integrity and quantified sequence-related error rates (0.2%). The contiguity and coverage allowed unprecedented access to low-complexity regions of the genome. Pericentromeric regions and coincidence of hypomethylation enabled localization of active centromeres and identified centromere-associated ALE family retro-elements that appear to have proliferated through relatively recent nested transposition events (<1 Ma). Genomic distances calculated based on synteny relationships were used to define a post-triplication Brassica-specific ancestral genome, and to calculate the extensive rearrangements that define the evolutionary distance separating B. nigra from its diploid relatives.
Collapse
Affiliation(s)
- Sampath Perumal
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Chu Shin Koh
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Lingling Jin
- Department of Computing Science, Thompson Rivers University, Kamloops, British Columbia, Canada
| | - Miles Buchwaldt
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Erin E Higgins
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | | | - Sateesh Kagale
- National Research Council Canada, Saskatoon, Saskatchewan, Canada
| | - Zahra-Katy Navabi
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Lily Tang
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Kyla N Horner
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Zhesi He
- Department of Biology, University of York, York, UK
| | - Ian Bancroft
- Department of Biology, University of York, York, UK
| | - Boulos Chalhoub
- Institute of Crop Science, Zhejiang University, Hangzhou, China
| | - Andrew G Sharpe
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.
| | | |
Collapse
|
228
|
Weissensteiner MH, Bunikis I, Catalán A, Francoijs KJ, Knief U, Heim W, Peona V, Pophaly SD, Sedlazeck FJ, Suh A, Warmuth VM, Wolf JBW. Discovery and population genomics of structural variation in a songbird genus. Nat Commun 2020; 11:3403. [PMID: 32636372 PMCID: PMC7341801 DOI: 10.1038/s41467-020-17195-4] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 06/16/2020] [Indexed: 02/07/2023] Open
Abstract
Structural variation (SV) constitutes an important type of genetic mutations providing the raw material for evolution. Here, we uncover the genome-wide spectrum of intra- and interspecific SV segregating in natural populations of seven songbird species in the genus Corvus. Combining short-read (N = 127) and long-read re-sequencing (N = 31), as well as optical mapping (N = 16), we apply both assembly- and read mapping approaches to detect SV and characterize a total of 220,452 insertions, deletions and inversions. We exploit sampling across wide phylogenetic timescales to validate SV genotypes and assess the contribution of SV to evolutionary processes in an avian model of incipient speciation. We reveal an evolutionary young (~530,000 years) cis-acting 2.25-kb LTR retrotransposon insertion reducing expression of the NDP gene with consequences for premating isolation. Our results attest to the wealth and evolutionary significance of SV segregating in natural populations and highlight the need for reliable SV genotyping.
Collapse
Affiliation(s)
- Matthias H Weissensteiner
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden.
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.
- Department of Biology, Pennsylvania State University, 310 Wartik Lab, University Park, PA, 16802, USA.
| | - Ignas Bunikis
- Uppsala Genome Center, Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, BMC, Box 815, 752 37, Uppsala, Sweden
| | - Ana Catalán
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | | | - Ulrich Knief
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Wieland Heim
- Institute of Landscsape Ecology, University of Münster, Heisenbergstrasse 2, 48149, Münster, Germany
| | - Valentina Peona
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Uppsala University, 752 36, Uppsala, Sweden
| | - Saurabh D Pophaly
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center at Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Alexander Suh
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Uppsala University, 752 36, Uppsala, Sweden
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TU, UK
| | - Vera M Warmuth
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Jochen B W Wolf
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden.
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.
| |
Collapse
|
229
|
Abstract
BACKGROUND The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly. RESULTS We propose OPTICALKERMIT which directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler, OPTICALKERMIT produces an assembly with almost three times higher NGA50 with a lower number of misassemblies on real A. thaliana reads. CONCLUSIONS OPTICALKERMIT successfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.
Collapse
Affiliation(s)
- Miika Leinonen
- Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Pietari Kalmin katu 5, Helsinki, Finland
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Pietari Kalmin katu 5, Helsinki, Finland.
| |
Collapse
|
230
|
Kraft F, Kurth I. Long-read sequencing to understand genome biology and cell function. Int J Biochem Cell Biol 2020; 126:105799. [PMID: 32629027 DOI: 10.1016/j.biocel.2020.105799] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 06/29/2020] [Accepted: 07/02/2020] [Indexed: 02/08/2023]
Abstract
Determining the sequence of DNA and RNA molecules has a huge impact on the understanding of cell biology and function. Recent advancements in next-generation short-read sequencing (NGS) technologies, drops in cost and a resolution down to the single-cell level shaped our current view on genome structure and function. Third-generation sequencing (TGS) methods further complete the knowledge about these processes based on long reads and the ability to analyze DNA or RNA at single molecule level. Long-read sequencing provides additional possibilities to study genome architecture and the composition of highly complex regions and to determine epigenetic modifications of nucleotide bases at a genome-wide level. We discuss the principles and advancements of long-read sequencing and its applications in genome biology.
Collapse
Affiliation(s)
- Florian Kraft
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, Aachen, Germany.
| | - Ingo Kurth
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
231
|
Fietz K, Trofimenko E, Guerin PE, Arnal V, Torres-Oliva M, Lobréaux S, Pérez-Ruzafa A, Manel S, Puebla O. New genomic resources for three exploited Mediterranean fishes. Genomics 2020; 112:4297-4303. [PMID: 32629099 DOI: 10.1016/j.ygeno.2020.06.041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 06/22/2020] [Accepted: 06/24/2020] [Indexed: 10/23/2022]
Abstract
Extensive fishing has led to fish stock declines throughout the last decades. While clear stock identification is required for designing management schemes, stock delineation is problematic due to generally low levels of genetic structure in marine species. The development of genomic resources can help to solve this issue. Here, we present the first mitochondrial and nuclear draft genome assemblies of three economically important Mediterranean fishes, the white seabream, the striped red mullet, and the comber. The assemblies are between 613 and 785 Mbp long and contain between 27,222 and 32,375 predicted genes. They were used as references to map Restriction-site Associated DNA markers, which were developed with a single-digest approach. This approach provided between 15,710 and 21,101 Single Nucleotide Polymorphism markers per species. These genomic resources will allow uncovering subtle genetic structure, identifying stocks, assigning catches to populations and assessing connectivity. Furthermore, the annotated genomes will help to characterize adaptive divergence.
Collapse
Affiliation(s)
- Katharina Fietz
- GEOMAR Helmholtz Centre for Ocean Research Kiel, Evolutionary Ecology of Marine Fishes, Düsternbrooker Weg 20, 24105 Kiel, Germany
| | - Elena Trofimenko
- GEOMAR Helmholtz Centre for Ocean Research Kiel, Evolutionary Ecology of Marine Fishes, Düsternbrooker Weg 20, 24105 Kiel, Germany
| | - Pierre-Edouard Guerin
- CEFE, Univ Montpellier, CNRS, EPHE-PSL University, IRD, Univ Paul Valéry Montpellier 3, Montpellier, France
| | - Véronique Arnal
- CEFE, Univ Montpellier, CNRS, EPHE-PSL University, IRD, Univ Paul Valéry Montpellier 3, Montpellier, France
| | - Montserrat Torres-Oliva
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Stéphane Lobréaux
- Laboratoire d'Ecologie Alpine, CNRS, Université Grenoble-Alpes, Grenoble, France
| | - Angel Pérez-Ruzafa
- Departmento de Ecología e Hidrología, Facultad de Biología, Campus de Espinardo, Regional Campus of International Excellence "Campus Mare Nostrum", University of Murcia, 30100 Murcia, Spain
| | - Stéphanie Manel
- CEFE, Univ Montpellier, CNRS, EPHE-PSL University, IRD, Univ Paul Valéry Montpellier 3, Montpellier, France.
| | - Oscar Puebla
- GEOMAR Helmholtz Centre for Ocean Research Kiel, Evolutionary Ecology of Marine Fishes, Düsternbrooker Weg 20, 24105 Kiel, Germany; Leibniz Centre for Tropical Marine Research, Fahrenheitstrasse 6, 28359 Bremen, Germany
| |
Collapse
|
232
|
Majidian S, Sedlazeck FJ. PhaseME: Automatic rapid assessment of phasing quality and phasing improvement. Gigascience 2020; 9:giaa078. [PMID: 32706368 PMCID: PMC7379178 DOI: 10.1093/gigascience/giaa078] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 05/28/2020] [Accepted: 07/01/2020] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The detection of which mutations are occurring on the same DNA molecule is essential to predict their consequences. This can be achieved by phasing the genomic variations. Nevertheless, state-of-the-art haplotype phasing is currently a black box in which the accuracy and quality of the reconstructed haplotypes are hard to assess. FINDINGS Here we present PhaseME, a versatile method to provide insights into and improvement of sample phasing results based on linkage data. We showcase the performance and the importance of PhaseME by comparing phasing information obtained from Pacific Biosciences including both continuous long reads and high-quality consensus reads, Oxford Nanopore Technologies, 10x Genomics, and Illumina sequencing technologies. We found that 10x Genomics and Oxford Nanopore phasing can be significantly improved while retaining a high N50 and completeness of phase blocks. PhaseME generates reports and summary plots to provide insights into phasing performance and correctness. We observed unique phasing issues for each of the sequencing technologies, highlighting the necessity of quality assessments. PhaseME is able to decrease the Hamming error rate significantly by 22.4% on average across all 5 technologies. Additionally, a significant improvement is obtained in the reduction of long switch errors. Especially for high-quality consensus reads, the improvement is 54.6% in return for only a 5% decrease in phase block N50 length. CONCLUSIONS PhaseME is a universal method to assess the phasing quality and accuracy and improves the quality of phasing using linkage information. The package is freely available at https://github.com/smajidian/phaseme.
Collapse
Affiliation(s)
- Sina Majidian
- School of Electrical Engineering, Iran University of Science & Technology, Narmak, Tehran 1684613114, Iran
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| |
Collapse
|
233
|
Tunjić Cvitanić M, Vojvoda Zeljko T, Pasantes JJ, García-Souto D, Gržan T, Despot-Slade E, Plohl M, Šatović E. Sequence Composition Underlying Centromeric and Heterochromatic Genome Compartments of the Pacific Oyster Crassostrea gigas. Genes (Basel) 2020; 11:genes11060695. [PMID: 32599860 PMCID: PMC7348941 DOI: 10.3390/genes11060695] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 06/10/2020] [Accepted: 06/22/2020] [Indexed: 02/07/2023] Open
Abstract
Segments of the genome enriched in repetitive sequences still present a challenge and are omitted in genome assemblies. For that reason, the exact composition of DNA sequences underlying the heterochromatic regions and the active centromeres are still unexplored for many organisms. The centromere is a crucial region of eukaryotic chromosomes responsible for the accurate segregation of genetic material. The typical landmark of centromere chromatin is the rapidly-evolving variant of the histone H3, CenH3, while DNA sequences packed in constitutive heterochromatin are associated with H3K9me3-modified histones. In the Pacific oyster Crassostrea gigas we identified its centromere histone variant, Cg-CenH3, that shows stage-specific distribution in gonadal cells. In order to investigate the DNA composition of genomic regions associated with the two specific chromatin types, we employed chromatin immunoprecipitation followed by high-throughput next-generation sequencing of the Cg-CenH3- and H3K9me3-associated sequences. CenH3-associated sequences were assigned to six groups of repetitive elements, while H3K9me3-associated-ones were assigned only to three. Those associated with CenH3 indicate the lack of uniformity in the chromosomal distribution of sequences building the centromeres, being also in the same time dispersed throughout the genome. The heterochromatin of C. gigas exhibited general paucity and limited chromosomal localization as predicted, with H3K9me3-associated sequences being predominantly constituted of DNA transposons.
Collapse
Affiliation(s)
- Monika Tunjić Cvitanić
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Tanja Vojvoda Zeljko
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Juan J. Pasantes
- Departamento de Bioquímica, Xenética e Inmunoloxía, Centro de Investigación Mariña (CIM), Universidade de Vigo, 36310 Vigo, Spain; (J.J.P.); (D.G.-S.)
| | - Daniel García-Souto
- Departamento de Bioquímica, Xenética e Inmunoloxía, Centro de Investigación Mariña (CIM), Universidade de Vigo, 36310 Vigo, Spain; (J.J.P.); (D.G.-S.)
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, Praza do Obradoiro, 0, 15705 Santiago de Compostela, Spain
- Cancer, Ageing and Somatic Mutation, Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Tena Gržan
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Evelin Despot-Slade
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
- Correspondence: (M.P.); (E.Š.)
| | - Eva Šatović
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
- Correspondence: (M.P.); (E.Š.)
| |
Collapse
|
234
|
instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder. Genome Biol 2020; 21:148. [PMID: 32552806 PMCID: PMC7386250 DOI: 10.1186/s13059-020-02041-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 05/11/2020] [Indexed: 02/06/2023] Open
Abstract
Hi-C exploits contact frequencies between pairs of loci to bridge and order contigs during genome assembly, resulting in chromosome-level assemblies. Because few robust programs are available for this type of data, we developed instaGRAAL, a complete overhaul of the GRAAL program, which has adapted the latter to allow efficient assembly of large genomes. instaGRAAL features a number of improvements over GRAAL, including a modular correction approach that optionally integrates independent data. We validate the program using data for two brown algae, and human, to generate near-complete assemblies with minimal human intervention.
Collapse
|
235
|
Jiang T, Liu B, Li J, Wang Y. rMETL: sensitive mobile element insertion detection with long read realignment. Bioinformatics 2020; 35:3484-3486. [PMID: 30759188 DOI: 10.1093/bioinformatics/btz106] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 01/24/2019] [Accepted: 02/12/2019] [Indexed: 01/22/2023] Open
Abstract
SUMMARY Mobile element insertion (MEI) is a major category of structure variations (SVs). The rapid development of long read sequencing technologies provides the opportunity to detect MEIs sensitively. However, the signals of MEI implied by noisy long reads are highly complex due to the repetitiveness of mobile elements as well as the high sequencing error rates. Herein, we propose the Realignment-based Mobile Element insertion detection Tool for Long read (rMETL). Benchmarking results of simulated and real datasets demonstrate that rMETL enables to handle the complex signals to discover MEIs sensitively. It is suited to produce high-quality MEI callsets in many genomics studies. AVAILABILITY AND IMPLEMENTATION rMETL is available from https://github.com/hitbc/rMETL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junyi Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|
236
|
Heller D, Vingron M. SVIM: structural variant identification using mapped long reads. Bioinformatics 2020; 35:2907-2915. [PMID: 30668829 PMCID: PMC6735718 DOI: 10.1093/bioinformatics/btz041] [Citation(s) in RCA: 202] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 01/04/2019] [Accepted: 01/22/2019] [Indexed: 02/07/2023] Open
Abstract
Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Heller
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
237
|
Lee CY. The fractal dimension as a measure for characterizing genetic variation of the human genome. Comput Biol Chem 2020; 87:107278. [PMID: 32563074 DOI: 10.1016/j.compbiolchem.2020.107278] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 11/18/2019] [Accepted: 05/04/2020] [Indexed: 11/18/2022]
Abstract
Motivated by the characteristics of highly clustered single nucleotide polymorphism (SNP) across the human genome, we propose a set of chromosome-wise fractal dimensions as a measure for identifying an individual for human polymorphism. The fractal dimension quantifies the degree of clustered distribution of SNPs and represents parsimoniously the genetic variation in a chromosome. In this sense, the proposed scheme projects the SNP genotype data into a new space which is simpler and lower in dimension. As an illustrative example, we estimate the chromosome-wise fractal dimensions of SNPs that are extracted from the HapMap of Phase III data set. To determine the validity of the proposed measure, we apply principal component analysis (PCA) to the set of estimated fractal dimensions and demonstrate that the set more or less described the population structure of 11 global populations. We also use multidimensional scaling to relate the genetic distances based on PCA to the geographical distances between global populations. This shows that, similar to the SNP genotype data, the fractal dimensions also has a role in genetic distance in the population structure. In addition, we apply the proposed measure to a signature for the classification of global populations by developing a support vector machine model. The selected feature model predicts the global population with a balanced accuracy of about 77%. These results support that the fractal dimension is an efficient way to describe the genetic variation of global populations.
Collapse
Affiliation(s)
- Chang-Yong Lee
- The Department of Industrial and Systems Engineering, Kongju National University, Cheonan, 31080, South Korea.
| |
Collapse
|
238
|
DNA methylation at the crossroads of gene and environment interactions. Essays Biochem 2020; 63:717-726. [PMID: 31782496 PMCID: PMC6923319 DOI: 10.1042/ebc20190031] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 10/18/2019] [Accepted: 10/22/2019] [Indexed: 12/15/2022]
Abstract
DNA methylation is an epigenetic mark involved in regulating genome function and is critical for normal development in mammals. It has been observed that the developmental environment can lead to permanent changes in gene expression and DNA methylation, at least at 'metastable epialleles'. These are defined as regions of the genome that show a variable epigenetic state that is established early in development and maintained through subsequent cell divisions. However, the majority of the known genome does not behave in this manner. Here, we use the developmental origins of adult disease hypothesis to understand environmental epigenomics. Some challenges to studying how DNA methylation is influenced by the environment include identifying DNA methylation changes associated with an environmental exposure in tissues with a complex cellular composition and at genomic regions for which DNA methylation is dynamically regulated in a cell-type specific manner. We also offer a perspective of how emerging technologies may be useful for dissecting the functional contribution of exposure-associated epigenetic changes and highlight recent evidence that suggests that genomic regions that are absent from genome assemblies may be unappreciated hotspots for environmental modulation of the epigenetic state.
Collapse
|
239
|
Zascavage RR, Hall CL, Thorson K, Mahmoud M, Sedlazeck FJ, Planz JV. Approaches to Whole Mitochondrial Genome Sequencing on the Oxford Nanopore MinION. ACTA ACUST UNITED AC 2020; 104:e94. [PMID: 31743587 DOI: 10.1002/cphg.94] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Traditional approaches for interrogating the mitochondrial genome often involve laborious extraction and enrichment protocols followed by Sanger sequencing. Although preparation techniques are still demanding, the advent of next-generation or massively parallel sequencing has made it possible to routinely obtain nucleotide-level data with relative ease. These short-read sequencing platforms offer deep coverage with unparalleled read accuracy in high-complexity genomic regions but encounter numerous difficulties in the low-complexity homopolymeric sequences characteristic of the mitochondrial genome. The inability to discern identical units within monomeric repeats and resolve copy-number variations for heteroplasmy detection results in suboptimal genome assemblies that ultimately complicate downstream data analysis and interpretation of biological significance. Oxford Nanopore Technologies offers the ability to generate long-read sequencing data on a pocket-sized device known as the MinION. Nanopore-based sequencing is scalable, portable, and theoretically capable of sequencing the entire mitochondrial genome in a single contig. Furthermore, the recent development of a nanopore protein with dual reader heads allows for clear identification of nucleotides within homopolymeric stretches, significantly increasing resolution throughout these regions. The unrestricted read lengths, superior homopolymeric resolution, and affordability of the MinION device make it an attractive alternative to the labor-intensive, time-consuming, and costly mainstay deep-sequencing platforms. This article describes three approaches to extract, prepare, and sequence mitochondrial DNA on the Oxford Nanopore MinION device. Two of the workflows include enrichment of mitochondrial DNA prior to sequencing, whereas the other relies on direct sequencing of native genomic DNA to allow for simultaneous assessment of the nuclear and mitochondrial genomes. © 2019 by John Wiley & Sons, Inc. Basic Protocol: Enrichment-free mitochondrial DNA sequencing Alternate Protocol 1: Mitochondrial DNA sequencing following enrichment with polymerase chain reaction (PCR) Alternate Protocol 2: Mitochondrial DNA sequencing following enrichment with PCR-free hybridization capture Support Protocol 1: DNA quantification and quality assessment using the Agilent 4200 TapeStation System Support Protocol 2: AMPure XP bead clean-up Support Protocol 3: Suggested data analysis pipeline.
Collapse
Affiliation(s)
- Roxanne R Zascavage
- Department of Criminology and Criminal Justice, University of Texas at Arlington, Arlington, Texas.,Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, Texas
| | - Courtney L Hall
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, Texas
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
| | - John V Planz
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, Texas
| |
Collapse
|
240
|
Russell LE, Schwarz UI. Variant discovery using next-generation sequencing and its future role in pharmacogenetics. Pharmacogenomics 2020; 21:471-486. [DOI: 10.2217/pgs-2019-0190] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) has enabled the discovery of a multitude of novel and mostly rare variants in pharmacogenes that may alter a patient’s therapeutic response to drugs. In addition to single nucleotide variants, structural variation affecting the number of copies of whole genes or parts of genes can be detected. While current guidelines concerning clinical implementation mostly act upon well-documented, common single nucleotide variants to guide dosing or drug selection, in silico and large-scale functional assessment of rare variant effects on protein function are at the forefront of pharmacogenetic research to facilitate their clinical integration. Here, we discuss the role of NGS in variant discovery, paving the way for more comprehensive genotype-guided pharmacotherapy that can translate to improved clinical care.
Collapse
Affiliation(s)
- Laura E Russell
- Department of Physiology & Pharmacology, Western University, Medical Sciences Building, London, ON, N6A 5C1, Canada
| | - Ute I Schwarz
- Department of Physiology & Pharmacology, Western University, Medical Sciences Building, London, ON, N6A 5C1, Canada
- Division of Clinical Pharmacology, Department of Medicine, Western University, London Health Sciences Centre – University Hospital, 339 Windermere Road, London, ON, N6A 5A5, Canada
| |
Collapse
|
241
|
Heyer EE, Blackburn J. Sequencing Strategies for Fusion Gene Detection. Bioessays 2020; 42:e2000016. [DOI: 10.1002/bies.202000016] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 03/11/2020] [Indexed: 02/06/2023]
Affiliation(s)
- Erin E. Heyer
- The Kinghorn Cancer CentreGarvan Institute of Medical Research 384 Victoria Street Darlinghurst NSW 2010 Australia
| | - James Blackburn
- The Kinghorn Cancer CentreGarvan Institute of Medical Research 384 Victoria Street Darlinghurst NSW 2010 Australia
- Faculty of Medicine, St. Vincent's Clinical SchoolUNSW, St Vincent's Hospital Victoria Street Darlinghurst NSW 2010 Australia
| |
Collapse
|
242
|
Luo R, Wong CL, Wong YS, Tang CI, Liu CM, Leung CM, Lam TW. Exploring the limit of using a deep neural network on pileup data for germline variant calling. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-0167-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
243
|
Abstract
Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range of genetic analyses, including studies of variation within and across species. As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. Here we review efforts to create pan-genomes for a range of species, from bacteria to humans, and we further consider the computational methods that have been proposed in order to capture, interpret and compare pan-genome data. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.
Collapse
Affiliation(s)
- Rachel M Sherman
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Steven L Salzberg
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
244
|
Michael TP, VanBuren R. Building near-complete plant genomes. CURRENT OPINION IN PLANT BIOLOGY 2020; 54:26-33. [PMID: 31981929 DOI: 10.1016/j.pbi.2019.12.009] [Citation(s) in RCA: 118] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 12/05/2019] [Accepted: 12/10/2019] [Indexed: 05/23/2023]
Abstract
Plant genomes span several orders of magnitude in size, vary in levels of ploidy and heterozygosity, and contain old and recent bursts of transposable elements, which render them challenging but interesting to assemble. Recent advances in single molecule sequencing and physical mapping technologies have enabled high-quality, chromosome scale assemblies of plant species with increasing complexity and size. Single molecule reads can now exceed megabases in length, providing unprecedented opportunities to untangle genomic regions missed by short read technologies. However, polyploid and heterozygous plant genomes are still difficult to assemble but provide opportunities for new tools and approaches. Haplotype phasing, structural variant analysis and de novo pan-genomics are the emerging frontiers in plant genome assembly.
Collapse
Affiliation(s)
- Todd P Michael
- Informatics Department, J. Craig Venter Institute, La Jolla, CA, USA.
| | - Robert VanBuren
- Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA; Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
245
|
Xiao T, Zhou W. The third generation sequencing: the advanced approach to genetic diseases. Transl Pediatr 2020; 9:163-173. [PMID: 32477917 PMCID: PMC7237973 DOI: 10.21037/tp.2020.03.06] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 02/05/2020] [Indexed: 01/05/2023] Open
Abstract
Genomic sequencing technologies have revolutionized mutation detection of the genetic diseases in the past few years. In recent years, the third generation sequencing (TGS) has been gaining insight into more genetic diseases owing to the single molecular and real time sequencing technology. This paper reviews the genomic sequencing revolutionary history first and then focuses on the genetic diseases discovered through the TGS and the clinical effects of the TGS, which is followed by the discussion of the improvement in the bioinformatic analysis for the TGS and its limitations. In summary, the TGS has been enhancing the diagnostic accuracy of genetic diseases in molecular level as well as paving a new way for basic researches and therapies.
Collapse
Affiliation(s)
- Tiantian Xiao
- Clinic of Neonatology, Children’s Hospital of Fudan University, Shanghai 201102, China
- Department of Neonatology, Chengdu Women’s and Children’s Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Wenhao Zhou
- Clinic of Neonatology, Children’s Hospital of Fudan University, Shanghai 201102, China
- Key Laboratory of Birth Defects, Children’s Hospital of Fudan University, Shanghai 201102, China
- Key Laboratory of Neonatal Diseases, Children’s Hospital of Fudan University, Shanghai 201102, China
| |
Collapse
|
246
|
Shahid S, Slotkin RK. The current revolution in transposable element biology enabled by long reads. CURRENT OPINION IN PLANT BIOLOGY 2020; 54:49-56. [PMID: 32007731 DOI: 10.1016/j.pbi.2019.12.012] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 12/20/2019] [Accepted: 12/23/2019] [Indexed: 06/10/2023]
Abstract
Technological advancement in DNA sequencing read-length has drastically changed the quality and completeness of decoded genomes. The aim of this article is not to describe the different technologies of long-read sequencing, or the widely appreciated power of this technology in genome sequencing, assembly, and gene annotation. Instead, in this article, we provide our opinion that with the exception of genome production, transposable element biology is the most radically altered field as a consequence of the advent of long-read sequencing technology. We review how long-reads have been used to answer key questions in transposable element biology, and how in the future long-reads will help elucidate the function of the repetitive fraction of genomes.
Collapse
Affiliation(s)
- Saima Shahid
- Donald Danforth Plant Science Center, St. Louis, MO, USA
| | - R Keith Slotkin
- Donald Danforth Plant Science Center, St. Louis, MO, USA; Division of Biological Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
247
|
Leidenfrost RM, Pöther DC, Jäckel U, Wünschiers R. Benchmarking the MinION: Evaluating long reads for microbial profiling. Sci Rep 2020; 10:5125. [PMID: 32198413 PMCID: PMC7083898 DOI: 10.1038/s41598-020-61989-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 03/04/2020] [Indexed: 12/22/2022] Open
Abstract
Nanopore based DNA-sequencing delivers long reads, thereby simplifying the decipherment of bacterial communities. Since its commercial appearance, this technology has been assigned several attributes, such as its error proneness, comparatively low cost, ease-of-use, and, most notably, aforementioned long reads. The technology as a whole is under continued development. As such, benchmarks are required to conceive, test and improve analysis protocols, including those related to the understanding of the composition of microbial communities. Here we present a dataset composed of twelve different prokaryotic species split into four samples differing by nucleic acid quantification technique to assess the specificity and sensitivity of the MinION nanopore sequencer in a blind study design. Taxonomic classification was performed by standard taxonomic sequence classification tools, namely Kraken, Kraken2 and Centrifuge directly on reads. This allowed taxonomic assignments of up to 99.27% on genus level and 92.78% on species level, enabling true-positive classification of strains down to 25,000 genomes per sample. Full genomic coverage is achieved for strains abundant as low as 250,000 genomes per sample under our experimental settings. In summary, we present an evaluation of nanopore sequence processing analysis with respect to microbial community composition. It provides an open protocol and the data may serve as basis for the development and benchmarking of future data processing pipelines.
Collapse
Affiliation(s)
- Robert Maximilian Leidenfrost
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17, 09648, Mittweida, Germany.
| | - Dierk-Christoph Pöther
- Unit for Biological Agents, Federal Institute for Occupational Safety and Health, Nöldnerstr. 40-42, 10317, Berlin, Germany
| | - Udo Jäckel
- Unit for Biological Agents, Federal Institute for Occupational Safety and Health, Nöldnerstr. 40-42, 10317, Berlin, Germany
| | - Röbbe Wünschiers
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17, 09648, Mittweida, Germany
| |
Collapse
|
248
|
Mathers TC. Improved Genome Assembly and Annotation of the Soybean Aphid ( Aphis glycines Matsumura). G3 (BETHESDA, MD.) 2020; 10:899-906. [PMID: 31969427 PMCID: PMC7056979 DOI: 10.1534/g3.119.400954] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Aphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 Kb, scaffold N50 = 174 Kb), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single-copy arthropod genes than version 1. To demonstrate the utility of this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.
Collapse
Affiliation(s)
- Thomas C Mathers
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, Norfolk, NR4 7UH, UK
| |
Collapse
|
249
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
250
|
Lee K, Kim MS, Lee JS, Bae DN, Jeong N, Yang K, Lee JD, Park JH, Moon JK, Jeong SC. Chromosomal features revealed by comparison of genetic maps of Glycine max and Glycine soja. Genomics 2020; 112:1481-1489. [PMID: 31461668 DOI: 10.1016/j.ygeno.2019.08.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/08/2019] [Accepted: 08/24/2019] [Indexed: 11/18/2022]
Abstract
Recombination is a crucial component of evolution and breeding. New combinations of variation on chromosomes are shaped by recombination. Recombination is also involved in chromosomal rearrangements. However, recombination rates vary tremendously among chromosome segments. Genome-wide genetic maps are one of the best tools to study variation of recombination. Here, we describe high density genetic maps of Glycine max and Glycine soja constructed from four segregating populations. The maps were used to identify chromosomal rearrangements and find the highly predictable pattern of cross-overs on the broad scale in soybean. Markers on these genetic maps were used to evaluate assembly quality of the current soybean reference genome sequence. We find a strong inversion candidate larger than 3 Mb based on patterns of cross-overs. We also identify quantitative trait loci (QTL) that control number of cross-overs. This study provides fundamental insights relevant to practical strategy for breeding programs and for pan-genome researches.
Collapse
Affiliation(s)
- Kwanghee Lee
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea
| | - Myung-Shin Kim
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea
| | - Ju Seok Lee
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea
| | - Dong Nyuk Bae
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea
| | - Namhee Jeong
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk 55365, Republic of Korea
| | - Kiwoung Yang
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea; Present address, Geolim Pharmaceutical Co., Ltd, QB e centum, 2307, Centumjunggang-ro 90, Heaundae-gu, Busan, Republic of Korea
| | - Jeong-Dong Lee
- School of Applied Biosciences, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Jung-Ho Park
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea
| | - Jung-Kyung Moon
- Agricultural Genome Center, National Academy of Agricultural Sciences, Rural Development Administration, Jeonju, Jeonbuk 55365, Republic of Korea
| | - Soon-Chun Jeong
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk 28116, Republic of Korea.
| |
Collapse
|