1
|
Marcolungo L, Vincenzi L, Ballottari M, Cecchin M, Cosentino E, Mignani T, Limongi A, Ferraris I, Orlandi M, Rossato M, Delledonne M. Structural Refinement by Direct Mapping Reveals Assembly Inconsistencies near Hi-C Junctions. PLANTS (BASEL, SWITZERLAND) 2023; 12:320. [PMID: 36679033 PMCID: PMC9861903 DOI: 10.3390/plants12020320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/22/2022] [Accepted: 01/06/2023] [Indexed: 06/17/2023]
Abstract
High-throughput chromosome conformation capture (Hi-C) is widely used for scaffolding in de novo assembly because it produces highly contiguous genomes, but its indirect statistical approach can introduce connection errors. We employed optical mapping (Bionano Genomics) as an orthogonal scaffolding technology to assess the structural solidity of Hi-C reconstructed scaffolds. Optical maps were used to assess the correctness of five de novo genome assemblies based on long-read sequencing for contig generation and Hi-C for scaffolding. Hundreds of inconsistencies were found between the reconstructions generated using the Hi-C and optical mapping approaches. Manual inspection, exploiting raw long-read sequencing data and optical maps, confirmed that several of these conflicts were derived from Hi-C joining errors. Such misjoins were widespread, involved the connection of both small and large contigs, and even overlapped annotated genes. We conclude that the integration of optical mapping data after, not before, Hi-C-based scaffolding, improves the quality of the assembly and limits reconstruction errors by highlighting misjoins that can then be subjected to further investigation.
Collapse
Affiliation(s)
- Luca Marcolungo
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Leonardo Vincenzi
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Matteo Ballottari
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Michela Cecchin
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | | | - Thomas Mignani
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Antonina Limongi
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Irene Ferraris
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Matteo Orlandi
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Marzia Rossato
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
- Genartis srl, Via IV Novembre 24, 37126 Verona, Italy
| | - Massimo Delledonne
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy
- Genartis srl, Via IV Novembre 24, 37126 Verona, Italy
| |
Collapse
|
2
|
Speck A, Trouvé JP, Enjalbert J, Geffroy V, Joets J, Moreau L. Genetic Architecture of Powdery Mildew Resistance Revealed by a Genome-Wide Association Study of a Worldwide Collection of Flax ( Linum usitatissimum L.). FRONTIERS IN PLANT SCIENCE 2022; 13:871633. [PMID: 35812909 PMCID: PMC9263915 DOI: 10.3389/fpls.2022.871633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/22/2022] [Indexed: 06/15/2023]
Abstract
Powdery mildew is one of the most important diseases of flax and is particularly prejudicial to its yield and oil or fiber quality. This disease, caused by the obligate biotrophic ascomycete Oïdium lini, is progressing in France. Genetic resistance of varieties is critical for the control of this disease, but very few resistance genes have been identified so far. It is therefore necessary to identify new resistance genes to powdery mildew suitable to the local context of pathogenicity. For this purpose, we studied a worldwide diversity panel composed of 311 flax genotypes both phenotyped for resistance to powdery mildew resistance over 2 years of field trials in France and resequenced. Sequence reads were mapped on the CDC Bethune reference genome revealing 1,693,910 high-quality SNPs, further used for both population structure analysis and genome-wide association studies (GWASs). A number of four major genetic groups were identified, separating oil flax accessions from America or Europe and those from Asia or Middle-East and fiber flax accessions originating from Eastern Europe and those from Western Europe. A number of eight QTLs were detected at the false discovery rate threshold of 5%, located on chromosomes 1, 2, 4, 13, and 14. Taking advantage of the moderate linkage disequilibrium present in the flax panel, and using the available genome annotation, we identified potential candidate genes. Our study shows the existence of new resistance alleles against powdery mildew in our diversity panel, of high interest for flax breeding program.
Collapse
Affiliation(s)
| | | | - Jérôme Enjalbert
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France
| | - Valérie Geffroy
- Université Paris-Saclay, CNRS, INRAE, Université Evry, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette, France
- Université de Paris, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette, France
| | - Johann Joets
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France
| | - Laurence Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France
| |
Collapse
|
3
|
Wong JS, Jadhav T, Young E, Wang Y, Xiao M. Characterization of full-length LINE-1 insertions in 154 genomes. Genomics 2021; 113:3804-3810. [PMID: 34534648 DOI: 10.1016/j.ygeno.2021.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 08/18/2021] [Accepted: 09/11/2021] [Indexed: 10/20/2022]
Abstract
Long interspersed nuclear elements (LINEs) are retrotransposons that contribute to genetic variation in the human genome. LINE-1 elements in larger-scale studies are challenging to identify using sequencing technologies due to cost and scalability. We developed an approach using optical mapping for detection of full-length LINE-1 insertions and 10× sequencing for confirmation. We found 51 true positive full-length LINE-1 insertions, of which 4 are novel insertions, in NA12878. Repeating our analysis on a larger sample set representing 26 populations, we identified 329 full-length LINE-1 elements, of which 123 are novel. 24.8% of these 329 LINE-1 insertions were shared amongst all 5 superpopulations (AFR, AMR, EUR, EAS, SAS). The African superpopulation has a higher percentage of population-specific LINE-1 insertions than any other superpopulation. These data indicate that our approach can provide high-speed, cost-effective, and increased accuracy for LINE-1 detection. These data also provide an insight into variations of LINE-1 elements between different populations.
Collapse
Affiliation(s)
- Jessica S Wong
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Tanaya Jadhav
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Eleanor Young
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Yilin Wang
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Ming Xiao
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America; Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, United States of America.
| |
Collapse
|
4
|
Yuan Y, Chung CYL, Chan TF. Advances in optical mapping for genomic research. Comput Struct Biotechnol J 2020; 18:2051-2062. [PMID: 32802277 PMCID: PMC7419273 DOI: 10.1016/j.csbj.2020.07.018] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 07/08/2020] [Accepted: 07/24/2020] [Indexed: 12/28/2022] Open
Abstract
Recent advances in optical mapping have allowed the construction of improved genome assemblies with greater contiguity. Optical mapping also enables genome comparison and identification of large-scale structural variations. Association of these large-scale genomic features with biological functions is an important goal in plant and animal breeding and in medical research. Optical mapping has also been used in microbiology and still plays an important role in strain typing and epidemiological studies. Here, we review the development of optical mapping in recent decades to illustrate its importance in genomic research. We detail its applications and algorithms to show its specific advantages. Finally, we discuss the challenges required to facilitate the optimization of optical mapping and improve its future development and application.
Collapse
Key Words
- 3D, three-dimensional
- DBG, de Bruijn graph
- DLS, direct label and strain
- DNA, deoxyribonucleic acid
- Genome assembly
- Hi-C, high-throughput chromosome conformation capture
- Mb, million base pair
- Next generation sequencing
- OLC, overlap-layout-consensus
- Optical mapping
- PCR, polymerase chain reaction
- PacBio, Pacific Biosciences
- SRS, short-read sequencing
- SV, structural variation
- Structural variation
- bp, base pair
- kb, kilobase pair
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
- AoE Centre for Genomic Studies on Plant-Environment Interaction for Sustainable Agriculture and Food Security, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Claire Yik-Lok Chung
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Ting-Fung Chan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
- AoE Centre for Genomic Studies on Plant-Environment Interaction for Sustainable Agriculture and Food Security, The Chinese University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Akhmetshina AO, Strygina KV, Khlestkina EK, Porokhovinova EA, Brutch NB. High-throughput sequencing techniques to flax genetics and breeding. ECOLOGICAL GENETICS 2020. [PMID: 0 DOI: 10.17816/ecogen16126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Flax (Linum usitatissimum L.) is an important oil and fiber crop. Using modern methods for flax breeding allows accelerating the introduction of some desired genes into the genotypes of future varieties. Today, an important condition for their creation is the development of research, that is based on next-generation sequencing (NGS). This review summarizes the results obtained using NGS in flax research. To date, a linkage map with a high marker density has been obtained for L. usitatissimum, which is already being used for a more efficient search for quantitative traits loci. Comparative studies of transcriptomes and miRNomes of flax under stress and in control conditions elucidated molecular-genetic mechanisms of abiotic and biotic stress responses. The very accurate model for genomic selection of flax resistant to pasmo was constructed. Based on NGS-sequencing also some details of the genus Linum evolution were clarified. The knowledge systematized in the review can be useful for researchers working in flax breeding and whereas fundamental interest for understanding the phylogenetic relationships within the genus Linum, the ontogenesis, and the mechanisms of the response of flax plants to various stress factors.
Collapse
|
6
|
You FM, Xiao J, Li P, Yao Z, Jia G, He L, Zhu T, Luo MC, Wang X, Deyholos MK, Cloutier S. Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 95:371-384. [PMID: 29681136 DOI: 10.1111/tpj.13944] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 03/19/2018] [Accepted: 03/22/2018] [Indexed: 05/19/2023]
Abstract
Genomes of varying sizes have been sequenced with next-generation sequencing platforms. However, most reference sequences include draft unordered scaffolds containing chimeras caused by mis-scaffolding. A BioNano genome (BNG) optical map was constructed to improve the previously sequenced flax genome (Linum usitatissimum L., 2n = 30, about 373 Mb), which consisted of 3852 scaffolds larger than 1 kb and totalling 300.6 Mb. The high-resolution BNG map of cv. CDC Bethune totalled 317 Mb and consisted of 251 BNG contigs with an N50 of 2.15 Mb. A total of 622 scaffolds (286.6 Mb, 94.9%) aligned to 211 BNG contigs (298.6 Mb, 94.2%). Of those, 99 scaffolds, diagnosed to contain assembly errors, were refined into 225 new scaffolds. Using the newly refined scaffold sequences and the validated bacterial artificial chromosome-based physical map of CDC Bethune, the 211 BNG contigs were scaffolded into 94 super-BNG contigs (N50 of 6.64 Mb) that were further assigned to the 15 flax chromosomes using the genetic map. The pseudomolecules total about 316 Mb, with individual chromosomes of 15.6 to 29.4 Mb, and cover 97% of the annotated genes. Evidence from the chromosome-scale pseudomolecules suggests that flax has undergone palaeopolyploidization and mesopolyploidization events, followed by rearrangements and deletions or fusion of chromosome arms from an ancient progenitor with a haploid chromosome number of eight.
Collapse
Affiliation(s)
- Frank M You
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada
| | - Jin Xiao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada
- State Key Lab of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China
| | - Pingchuan Li
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada
| | - Zhen Yao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada
| | - Gaofeng Jia
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada
- Crop Development Centre, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Liqiang He
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, R6M 1Y5, Canada
| | - Tingting Zhu
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Ming-Cheng Luo
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Xiue Wang
- State Key Lab of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China
| | | | - Sylvie Cloutier
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, K1A 0C6, Canada
| |
Collapse
|
7
|
Abstract
The output from whole genome sequencing is a set of contigs, i.e. short non-overlapping DNA sequences (sizes 1-100 kilobasepairs). Piecing the contigs together is an especially difficult task for previously unsequenced DNA, and may not be feasible due to factors such as the lack of sufficient coverage or larger repetitive regions which generate gaps in the final sequence. Here we propose a new method for scaffolding such contigs. The proposed method uses densely labeled optical DNA barcodes from competitive binding experiments as scaffolds. On these scaffolds we position theoretical barcodes which are calculated from the contig sequences. This allows us to construct longer DNA sequences from the contig sequences. This proof-of-principle study extends previous studies which use sparsely labeled DNA barcodes for scaffolding purposes. Our method applies a probabilistic approach that allows us to discard “foreign” contigs from mixed samples with contigs from different types of DNA. We satisfy the contig non-overlap constraint by formulating the contig placement challenge as a combinatorial auction problem. Our exact algorithm for solving this problem reduces computational costs compared to previous methods in the combinatorial auction field. We demonstrate the usefulness of the proposed scaffolding method both for synthetic contigs and for contigs obtained using Illumina sequencing for a mixed sample with plasmid and chromosomal DNA.
Collapse
|
8
|
Barley Genome Sequencing and Assembly—A First Version Reference Sequence. COMPENDIUM OF PLANT GENOMES 2018. [DOI: 10.1007/978-3-319-92528-8_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
9
|
Łopacińska-Jørgensen JM, Pedersen JN, Bak M, Mehrjouy MM, Sørensen KT, Østergaard PF, Bilenberg B, Kristensen A, Taboryski RJ, Flyvbjerg H, Marie R, Tommerup N, Silahtaroglu A. Enrichment of megabase-sized DNA molecules for single-molecule optical mapping and next-generation sequencing. Sci Rep 2017; 7:17893. [PMID: 29263336 PMCID: PMC5738345 DOI: 10.1038/s41598-017-18091-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 12/06/2017] [Indexed: 11/23/2022] Open
Abstract
Next-generation sequencing (NGS) has caused a revolution, yet left a gap: long-range genetic information from native, non-amplified DNA fragments is unavailable. It might be obtained by optical mapping of megabase-sized DNA molecules. Frequently only a specific genomic region is of interest, so here we introduce a method for selection and enrichment of megabase-sized DNA molecules intended for single-molecule optical mapping: DNA from a human cell line is digested by the NotI rare-cutting enzyme and size-selected by pulsed-field gel electrophoresis. For demonstration, more than 600 sub-megabase- to megabase-sized DNA molecules were recovered from the gel and analysed by denaturation-renaturation optical mapping. Size-selected molecules from the same gel were sequenced by NGS. The optically mapped molecules and the NGS reads showed enrichment from regions defined by NotI restriction sites. We demonstrate that the unannotated genome can be characterized in a locus-specific manner via molecules partially overlapping with the annotated genome. The method is a promising tool for investigation of structural variants in enriched human genomic regions for both research and diagnostic purposes. Our enrichment method could potentially work with other genomes or target specified regions by applying other genomic editing tools, such as the CRISPR/Cas9 system.
Collapse
Affiliation(s)
- Joanna M Łopacińska-Jørgensen
- Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Nørre Alle 14, Copenhagen, 2200, Denmark
| | - Jonas N Pedersen
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Mads Bak
- Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Nørre Alle 14, Copenhagen, 2200, Denmark
| | - Mana M Mehrjouy
- Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Nørre Alle 14, Copenhagen, 2200, Denmark
| | - Kristian T Sørensen
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Peter F Østergaard
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Brian Bilenberg
- NIL Technology ApS, Diplomvej 381, Kongens Lyngby, 2800, Denmark
| | - Anders Kristensen
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Rafael J Taboryski
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Henrik Flyvbjerg
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Rodolphe Marie
- Department of Micro- and Nanotechnology, Technical University of Denmark, Ørsteds Plads 345a, Kongens Lyngby, 2800, Denmark
| | - Niels Tommerup
- Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Nørre Alle 14, Copenhagen, 2200, Denmark
| | - Asli Silahtaroglu
- Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Nørre Alle 14, Copenhagen, 2200, Denmark.
| |
Collapse
|
10
|
Larsen PA, Harris RA, Liu Y, Murali SC, Campbell CR, Brown AD, Sullivan BA, Shelton J, Brown SJ, Raveendran M, Dudchenko O, Machol I, Durand NC, Shamim MS, Aiden EL, Muzny DM, Gibbs RA, Yoder AD, Rogers J, Worley KC. Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus). BMC Biol 2017; 15:110. [PMID: 29145861 PMCID: PMC5689209 DOI: 10.1186/s12915-017-0439-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 10/10/2017] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly. METHODS We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome. RESULTS We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies. CONCLUSIONS We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.
Collapse
Affiliation(s)
- Peter A. Larsen
- Department of Biology, Duke University, Durham, NC 27708 USA
| | - R. Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Yue Liu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Shwetha C. Murali
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Present address: Department of Genome Sciences, University of Washington, Seattle, WA 98195 USA
| | | | - Adam D. Brown
- Department of Pharmacology and Cancer Biology, Duke University, Durham, NC 27710 USA
- Present address: Bristol Myers-Squibb, 420 W Round Grove Rd, Lewisville, TX 75067 USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710 USA
| | - Jennifer Shelton
- Kansas State University Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506 USA
- Present address: New York Genome Center, 101 Avenue of the Americas, New York, NY 10013 USA
| | - Susan J. Brown
- Kansas State University Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506 USA
| | | | - Olga Dudchenko
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Ido Machol
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Neva C. Durand
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Muhammad S. Shamim
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Erez Lieberman Aiden
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Anne D. Yoder
- Department of Biology, Duke University, Durham, NC 27708 USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Kim C. Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| |
Collapse
|
11
|
Recanati A, Brüls T, d’Aspremont A. A spectral algorithm for fast de novo layout of uncorrected long nanopore reads. Bioinformatics 2017; 33:3188-3194. [DOI: 10.1093/bioinformatics/btx370] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 06/06/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Thomas Brüls
- Commissariat à l’Energie Atomique et aux Energies Alternatives, Direction de la Recherche Fondamentale, Genoscope
- UMR 8030, Centre National de la Recherche Scientifique, Université Paris-Saclay, Evry, France
- Université Paris-Saclay, Evry, France
| | | |
Collapse
|
12
|
Ferreira AC, Dias R, de Sá MIC, Tenreiro R. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains. Vet Microbiol 2016; 192:220-225. [DOI: 10.1016/j.vetmic.2016.07.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 07/28/2016] [Accepted: 07/30/2016] [Indexed: 11/27/2022]
|
13
|
Characterization of amylomaltase from Thermus filiformis and the increase in alkaline and thermo-stability by E27R substitution. Process Biochem 2015. [DOI: 10.1016/j.procbio.2015.08.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
14
|
Xiao S, Li J, Ma F, Fang L, Xu S, Chen W, Wang ZY. Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the whole-genome mapping in BioNano Genomics Irys system. BMC Genomics 2015; 16:670. [PMID: 26336087 PMCID: PMC4559010 DOI: 10.1186/s12864-015-1871-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 08/21/2015] [Indexed: 12/21/2022] Open
Abstract
Background Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. Results For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Conclusion Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1871-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shijun Xiao
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Jiongtang Li
- Chinese Academy of Fishery Sciences, Yongding Road, Beijing, P.R. China
| | | | - Lujing Fang
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Shuangbin Xu
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Wei Chen
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Zhi Yong Wang
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China.
| |
Collapse
|
15
|
Twyford AD, Streisfeld MA, Lowry DB, Friedman J. Genomic studies on the nature of species: adaptation and speciation inMimulus. Mol Ecol 2015; 24:2601-9. [DOI: 10.1111/mec.13190] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Revised: 03/25/2015] [Accepted: 03/27/2015] [Indexed: 12/27/2022]
Affiliation(s)
- Alex D. Twyford
- Ashworth Laboratories; Institute of Evolutionary Biology; The University of Edinburgh; Charlotte Auerbach Road Edinburgh EH9 3FL UK
- Department of Biology; Syracuse University; 107 College Place Syracuse NY 13244 USA
| | | | - David B. Lowry
- Plant Biology Laboratories; Department of Plant Biology; Michigan State University; 612 Wilson Road Room 166 East Lansing MI 48824 USA
| | - Jannice Friedman
- Department of Biology; Syracuse University; 107 College Place Syracuse NY 13244 USA
| |
Collapse
|
16
|
A fast and scalable kymograph alignment algorithm for nanochannel-based optical DNA mappings. PLoS One 2015; 10:e0121905. [PMID: 25875920 PMCID: PMC4395267 DOI: 10.1371/journal.pone.0121905] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 02/05/2015] [Indexed: 11/26/2022] Open
Abstract
Optical mapping by direct visualization of individual DNA molecules, stretched in nanochannels with sequence-specific fluorescent labeling, represents a promising tool for disease diagnostics and genomics. An important challenge for this technique is thermal motion of the DNA as it undergoes imaging; this blurs fluorescent patterns along the DNA and results in information loss. Correcting for this effect (a process referred to as kymograph alignment) is a common preprocessing step in nanochannel-based optical mapping workflows, and we present here a highly efficient algorithm to accomplish this via pattern recognition. We compare our method with the one previous approach, and we find that our method is orders of magnitude faster while producing data of similar quality. We demonstrate proof of principle of our approach on experimental data consisting of melt mapped bacteriophage DNA.
Collapse
|
17
|
Levy-Sakin M, Grunwald A, Kim S, Gassman NR, Gottfried A, Antelman J, Kim Y, Ho S, Samuel R, Michalet X, Lin RR, Dertinger T, Kim AS, Chung S, Colyer RA, Weinhold E, Weiss S, Ebenstein Y. Toward single-molecule optical mapping of the epigenome. ACS NANO 2014; 8:14-26. [PMID: 24328256 PMCID: PMC4022788 DOI: 10.1021/nn4050694] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The past decade has seen an explosive growth in the utilization of single-molecule techniques for the study of complex systems. The ability to resolve phenomena otherwise masked by ensemble averaging has made these approaches especially attractive for the study of biological systems, where stochastic events lead to inherent inhomogeneity at the population level. The complex composition of the genome has made it an ideal system to study at the single-molecule level, and methods aimed at resolving genetic information from long, individual, genomic DNA molecules have been in use for the last 30 years. These methods, and particularly optical-based mapping of DNA, have been instrumental in highlighting genomic variation and contributed significantly to the assembly of many genomes including the human genome. Nanotechnology and nanoscopy have been a strong driving force for advancing genomic mapping approaches, allowing both better manipulation of DNA on the nanoscale and enhanced optical resolving power for analysis of genomic information. During the past few years, these developments have been adopted also for epigenetic studies. The common principle for these studies is the use of advanced optical microscopy for the detection of fluorescently labeled epigenetic marks on long, extended DNA molecules. Here we will discuss recent single-molecule studies for the mapping of chromatin composition and epigenetic DNA modifications, such as DNA methylation.
Collapse
Affiliation(s)
- Michal Levy-Sakin
- Raymond and Beverly Sackler Faculty of Exact Sciences, School of Chemistry, Tel Aviv University, Tel Aviv, Israel
| | - Assaf Grunwald
- Raymond and Beverly Sackler Faculty of Exact Sciences, School of Chemistry, Tel Aviv University, Tel Aviv, Israel
| | - Soohong Kim
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Natalie R. Gassman
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Anna Gottfried
- Institute of Organic Chemistry, RWTH Aachen University, Aachen, Germany
| | - Josh Antelman
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Younggyu Kim
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Sam Ho
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Robin Samuel
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Xavier Michalet
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Ron R. Lin
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Thomas Dertinger
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Andrew S. Kim
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Sangyoon Chung
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Ryan A. Colyer
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | - Elmar Weinhold
- Institute of Organic Chemistry, RWTH Aachen University, Aachen, Germany
| | - Shimon Weiss
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
- Corresponding authors: (Y. Ebenstein), (S. Weiss)
| | - Yuval Ebenstein
- Raymond and Beverly Sackler Faculty of Exact Sciences, School of Chemistry, Tel Aviv University, Tel Aviv, Israel
- Corresponding authors: (Y. Ebenstein), (S. Weiss)
| |
Collapse
|
18
|
Chamala S, Chanderbali AS, Der JP, Lan T, Walts B, Albert VA, dePamphilis CW, Leebens-Mack J, Rounsley S, Schuster SC, Wing RA, Xiao N, Moore R, Soltis PS, Soltis DE, Barbazuk WB. Assembly and Validation of the Genome of the Nonmodel Basal Angiosperm Amborella. Science 2013; 342:1516-7. [DOI: 10.1126/science.1241130] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
19
|
OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S7. [PMID: 24564959 PMCID: PMC4029551 DOI: 10.1186/1752-0509-7-s6-s7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Background Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying. Results To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity. Conclusion As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome. Availability http://140.116.235.124/~tliu/omacc
Collapse
|
20
|
Genome Sequence of Halomonas sp. Strain A3H3, Isolated from Arsenic-Rich Marine Sediments. GENOME ANNOUNCEMENTS 2013; 1:1/5/e00819-13. [PMID: 24115546 PMCID: PMC3795216 DOI: 10.1128/genomea.00819-13] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We report the genome sequence of Halomonas sp. strain A3H3, a bacterium with a high tolerance to arsenite, isolated from multicontaminated sediments of the l’Estaque harbor in Marseille, France. The genome is composed of a 5,489,893-bp chromosome and a 157,085-bp plasmid.
Collapse
|
21
|
Kirkland B, Wang Z, Zhang P, Takebayashi SI, Lenhert S, Gilbert DM, Guan J. Low-cost fabrication of centimetre-scale periodic arrays of single plasmid DNA molecules. LAB ON A CHIP 2013; 13:3367-72. [PMID: 23824041 PMCID: PMC3753405 DOI: 10.1039/c3lc50562f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We report the development of a low-cost method to generate a centimetre-scale periodic array of single plasmid DNA molecules of 11 kilobase pairs. The arrayed DNA molecules are amenable to enzymatic and physical manipulations.
Collapse
Affiliation(s)
- Brett Kirkland
- Department of Chemical and Biomedical Engineering, FAMU-FSU College of Engineering, Florida State University, 2525 Pottsdamer Street, Tallahassee, Florida 32310-2870, USA
| | - Zhibin Wang
- Department of Chemical and Biomedical Engineering, FAMU-FSU College of Engineering, Florida State University, 2525 Pottsdamer Street, Tallahassee, Florida 32310-2870, USA
| | - Peipei Zhang
- Department of Chemical and Biomedical Engineering, FAMU-FSU College of Engineering, Florida State University, 2525 Pottsdamer Street, Tallahassee, Florida 32310-2870, USA
| | - Shin-ichiro Takebayashi
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Steven Lenhert
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
- Integrative NanoScience Institute, Florida State University, Tallahassee, Florida 32306-4370, USA
| | - David M. Gilbert
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Jingjiao Guan
- Department of Chemical and Biomedical Engineering, FAMU-FSU College of Engineering, Florida State University, 2525 Pottsdamer Street, Tallahassee, Florida 32310-2870, USA
- Integrative NanoScience Institute, Florida State University, Tallahassee, Florida 32306-4370, USA
| |
Collapse
|
22
|
Mazurie AJ, Alves JM, Ozaki LS, Zhou S, Schwartz DC, Buck GA. Comparative genomics of cryptosporidium. Int J Genomics 2013; 2013:832756. [PMID: 23738321 PMCID: PMC3659464 DOI: 10.1155/2013/832756] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2013] [Accepted: 04/10/2013] [Indexed: 11/18/2022] Open
Abstract
Until recently, the apicomplexan parasites, Cryptosporidium hominis and C. parvum, were considered the same species. However, the two parasites, now considered distinct species, exhibit significant differences in host range, infectivity, and pathogenicity, and their sequenced genomes exhibit only 95-97% identity. The availability of the complete genome sequences of these organisms provides the potential to identify the genetic variations that are responsible for the phenotypic differences between the two parasites. We compared the genome organization and structure, gene composition, the metabolic and other pathways, and the local sequence identity between the genes of these two Cryptosporidium species. Our observations show that the phenotypic differences between C. hominis and C. parvum are not due to gross genome rearrangements, structural alterations, gene deletions or insertions, metabolic capabilities, or other obvious genomic alterations. Rather, the results indicate that these genomes exhibit a remarkable structural and compositional conservation and suggest that the phenotypic differences observed are due to subtle variations in the sequences of proteins that act at the interface between the parasite and its host.
Collapse
Affiliation(s)
- Aurélien J. Mazurie
- Department of Microbiology, Montana State University, Bozeman, MT 59717, USA
- Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284-2030, USA
| | - João M. Alves
- Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284-2030, USA
| | - Luiz S. Ozaki
- Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284-2030, USA
| | - Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - David C. Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Gregory A. Buck
- Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284-2030, USA
| |
Collapse
|
23
|
McGinn S, Gut IG. DNA sequencing – spanning the generations. N Biotechnol 2013; 30:366-72. [DOI: 10.1016/j.nbt.2012.11.012] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2012] [Accepted: 11/05/2012] [Indexed: 02/02/2023]
|
24
|
Dorfman KD, King SB, Olson DW, Thomas JDP, Tree DR. Beyond gel electrophoresis: microfluidic separations, fluorescence burst analysis, and DNA stretching. Chem Rev 2013; 113:2584-667. [PMID: 23140825 PMCID: PMC3595390 DOI: 10.1021/cr3002142] [Citation(s) in RCA: 141] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Kevin D. Dorfman
- Department of Chemical Engineering and Materials Science, University of Minnesota — Twin Cities, 421 Washington Ave. SE, Minneapolis, MN 55455, Phone: 1-612-624-5560. Fax: 1-612-626-7246
| | - Scott B. King
- Department of Chemical Engineering and Materials Science, University of Minnesota — Twin Cities, 421 Washington Ave. SE, Minneapolis, MN 55455, Phone: 1-612-624-5560. Fax: 1-612-626-7246
| | - Daniel W. Olson
- Department of Chemical Engineering and Materials Science, University of Minnesota — Twin Cities, 421 Washington Ave. SE, Minneapolis, MN 55455, Phone: 1-612-624-5560. Fax: 1-612-626-7246
| | - Joel D. P. Thomas
- Department of Chemical Engineering and Materials Science, University of Minnesota — Twin Cities, 421 Washington Ave. SE, Minneapolis, MN 55455, Phone: 1-612-624-5560. Fax: 1-612-626-7246
| | - Douglas R. Tree
- Department of Chemical Engineering and Materials Science, University of Minnesota — Twin Cities, 421 Washington Ave. SE, Minneapolis, MN 55455, Phone: 1-612-624-5560. Fax: 1-612-626-7246
| |
Collapse
|
25
|
AGORA: Assembly Guided by Optical Restriction Alignment. BMC Bioinformatics 2012; 13:189. [PMID: 22856673 PMCID: PMC3431216 DOI: 10.1186/1471-2105-13-189] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Accepted: 06/28/2012] [Indexed: 11/10/2022] Open
Abstract
Background Genome assembly is difficult due to repeated sequences within the genome, which create ambiguities and cause the final assembly to be broken up into many separate sequences (contigs). Long range linking information, such as mate-pairs or mapping data, is necessary to help assembly software resolve repeats, thereby leading to a more complete reconstruction of genomes. Prior work has used optical maps for validating assemblies and scaffolding contigs, after an initial assembly has been produced. However, optical maps have not previously been used within the genome assembly process. Here, we use optical map information within the popular de Bruijn graph assembly paradigm to eliminate paths in the de Bruijn graph which are not consistent with the optical map and help determine the correct reconstruction of the genome. Results We developed a new algorithm called AGORA: Assembly Guided by Optical Restriction Alignment. AGORA is the first algorithm to use optical map information directly within the de Bruijn graph framework to help produce an accurate assembly of a genome that is consistent with the optical map information provided. Our simulations on bacterial genomes show that AGORA is effective at producing assemblies closely matching the reference sequences. Additionally, we show that noise in the optical map can have a strong impact on the final assembly quality for some complex genomes, and we also measure how various characteristics of the starting de Bruijn graph may impact the quality of the final assembly. Lastly, we show that a proper choice of restriction enzyme for the optical map may substantially improve the quality of the final assembly. Conclusions Our work shows that optical maps can be used effectively to assemble genomes within the de Bruijn graph assembly framework. Our experiments also provide insights into the characteristics of the mapping data that most affect the performance of our algorithm, indicating the potential benefit of more accurate optical mapping technologies, such as nano-coding.
Collapse
|
26
|
Kim S, Gottfried A, Lin RR, Dertinger T, Kim AS, Chung S, Colyer RA, Weinhold E, Weiss S, Ebenstein Y. Enzymatically incorporated genomic tags for optical mapping of DNA-binding proteins. Angew Chem Int Ed Engl 2012; 51:3578-81. [PMID: 22344826 DOI: 10.1002/anie.201107714] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Revised: 12/19/2011] [Indexed: 11/08/2022]
Affiliation(s)
- Soohong Kim
- Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Kim S, Gottfried A, Lin RR, Dertinger T, Kim AS, Chung S, Colyer RA, Weinhold E, Weiss S, Ebenstein Y. Enzymatically Incorporated Genomic Tags for Optical Mapping of DNA-Binding Proteins. Angew Chem Int Ed Engl 2012. [DOI: 10.1002/ange.201107714] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
28
|
Comparing de novo genome assembly: the long and short of it. PLoS One 2011; 6:e19175. [PMID: 21559467 PMCID: PMC3084767 DOI: 10.1371/journal.pone.0019175] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 03/29/2011] [Indexed: 01/30/2023] Open
Abstract
Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
Collapse
|
29
|
Neely RK, Deen J, Hofkens J. Optical mapping of DNA: Single-molecule-based methods for mapping genomes. Biopolymers 2011; 95:298-311. [DOI: 10.1002/bip.21579] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 12/15/2010] [Accepted: 12/15/2010] [Indexed: 11/09/2022]
|
30
|
Zohar H, Hetherington CL, Bustamante C, Muller SJ. Peptide nucleic acids as tools for single-molecule sequence detection and manipulation. NANO LETTERS 2010; 10:4697-701. [PMID: 20923183 PMCID: PMC3322611 DOI: 10.1021/nl102986v] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The ability to strongly and sequence-specifically attach modifications such as fluorophores and haptens to individual double-stranded (ds) DNA molecules is critical to a variety of single-molecule experiments. We propose using modified peptide nucleic acids (PNAs) for this purpose and implement them in two model single-molecule experiments where individual DNA molecules are manipulated via microfluidic flow and optical tweezers, respectively. We demonstrate that PNAs are versatile and robust sequence-specific tethers.
Collapse
Affiliation(s)
- Hagar Zohar
- Department of Chemical Engineering, California Institute for Quantitative Biosciences, and Howard Hughes Medical InstituteUniversity of California, Berkeley, California 94720, United States
| | | | | | | |
Collapse
|
31
|
Giongo A, Tyler HL, Zipperer UN, Triplett EW. Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission. Stand Genomic Sci 2010; 2:309-17. [PMID: 21304715 PMCID: PMC3035290 DOI: 10.4056/sigs.972221] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.
Collapse
Affiliation(s)
- Adriana Giongo
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, PO Box 110700, Gainesville, FL 32611-0700 USA
| | | | | | | |
Collapse
|
32
|
Andorf CM, Lawrence CJ, Harper LC, Schaeffer ML, Campbell DA, Sen TZ. The Locus Lookup tool at MaizeGDB: identification of genomic regions in maize by integrating sequence information with physical and genetic maps. Bioinformatics 2010; 26:434-6. [PMID: 20124413 DOI: 10.1093/bioinformatics/btp556] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Methods to automatically integrate sequence information with physical and genetic maps are scarce. The Locus Lookup tool enables researchers to define windows of genomic sequence likely to contain loci of interest where only genetic or physical mapping associations are reported. Using the Locus Lookup tool, researchers will be able to locate specific genes more efficiently that will ultimately help them develop a better maize plant. With the availability of the well-documented source code, the tool can be easily adapted to other biological systems. AVAILABILITY The Locus Lookup tool is available on the web at http://maizegdb.org/cgi-bin/locus_lookup.cgi. It is implemented in PHP, Oracle and Apache, with all major browsers supported. Source code is freely available for download at http://ftp.maizegdb.org/open_source/locus_lookup/.
Collapse
Affiliation(s)
- Carson M Andorf
- US Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | | | | | | | | | | |
Collapse
|
33
|
Mir KU. Sequencing genomes: from individuals to populations. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2010; 8:367-78. [PMID: 19808932 DOI: 10.1093/bfgp/elp040] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The whole genome sequences of Jim Watson and Craig Venter are early examples of personalized genomics, which promises to change how we approach healthcare in the future. Before personal sequencing can have practical medical benefits, however, and before it should be advocated for implementation at the population-scale, there needs to be a better understanding of which genetic variants influence which traits and how their effects are modified by epigenetic factors. Nonetheless, for forging links between DNA sequence and phenotype, efforts to sequence the genomes of individuals need to continue; this includes sequencing sub-populations for association studies which analyse the difference in sequence between disease affected and unaffected individuals. Such studies can only be applied on a large enough scale to be effective if the massive strides in sequencing technology that have recently occurred also continue.
Collapse
Affiliation(s)
- Kalim U Mir
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
34
|
Neely RK, Dedecker P, Hotta JI, Urbanavičiūtė G, Klimašauskas S, Hofkens J. DNA fluorocode: A single molecule, optical map of DNA with nanometre resolution. Chem Sci 2010. [DOI: 10.1039/c0sc00277a] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
35
|
Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, Pape L, Mehan MR, Churas C, Pasternak S, Forrest DK, Wise R, Ware D, Wing RA, Waterman MS, Livny M, Schwartz DC. A single molecule scaffold for the maize genome. PLoS Genet 2009; 5:e1000711. [PMID: 19936062 PMCID: PMC2774507 DOI: 10.1371/journal.pgen.1000711] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 10/05/2009] [Indexed: 11/18/2022] Open
Abstract
About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/∼23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/∼2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars. The maize genome contains abundant repeats interspersed by low-copy, gene-coding sequences that make it a challenge to sequence; consequently, current BAC sequence assemblies average 11 contigs per clone. The iMap deals with such complexity by the judicious integration of IBM genetic and B73 physical maps, but the B73 genome structure could differ from the IBM population because of genetic recombination and subsequent rearrangements. Accordingly, we report a genome-wide, high-resolution optical map of maize B73 genome that was constructed from the direct analysis of genomic DNA molecules without using genetic markers. The integration of optical and iMap resources with comparisons to FPC maps enabled a uniquely comprehensive and scalable assessment of a given BAC's sequence assembly, its placement within a FPC contig, and the location of this FPC contig within a chromosome-wide pseudomolecule. As such, the overall utility of the maize optical map for the validation of sequence assemblies has been significant and demonstrates the inherent advantages of single molecule platforms. Construction of the maize optical map represents the first physical map of a eukaryotic genome larger than 400 Mb that was created de novo from individual genomic DNA molecules.
Collapse
Affiliation(s)
- Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Fusheng Wei
- Department of Plant Sciences, Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
| | - John Nguyen
- Departments of Mathematics, Biology, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Mike Bechner
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Konstantinos Potamousis
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Steve Goldstein
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Louise Pape
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Michael R. Mehan
- Departments of Mathematics, Biology, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Chris Churas
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Shiran Pasternak
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Dan K. Forrest
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Roger Wise
- Corn Insects and Crop Genetics Research, United States Department of Agriculture–Agricultural Research Service and Department of Plant Pathology, Iowa State University, Ames, Iowa, United States of America
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- Plant, Soil, and Nutrition Research, United States Department of Agriculture–Agricultural Research Service, Ithaca, New York, United States of America
| | - Rod A. Wing
- Department of Plant Sciences, Arizona Genomics Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Michael S. Waterman
- Departments of Mathematics, Biology, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Miron Livny
- Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - David C. Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, UW Biotechnology Center, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
36
|
Zhou S, Bechner MC, Place M, Churas CP, Pape L, Leong SA, Runnheim R, Forrest DK, Goldstein S, Livny M, Schwartz DC. Validation of rice genome sequence by optical mapping. BMC Genomics 2007; 8:278. [PMID: 17697381 PMCID: PMC2048515 DOI: 10.1186/1471-2164-8-278] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2007] [Accepted: 08/15/2007] [Indexed: 11/30/2022] Open
Abstract
Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences revealed by optical maps constructed from a broad range of rice subspecies and varieties.
Collapse
Affiliation(s)
- Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Michael C Bechner
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Michael Place
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Chris P Churas
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Louise Pape
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Sally A Leong
- USDA-ARS, CCRU, Department of Plant Pathology, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Rod Runnheim
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Dan K Forrest
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Steve Goldstein
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Miron Livny
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - David C Schwartz
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, UW Biotechnology Centre, 425 Henry Mall, Madison, Wisconsin 53706, USA
- Department of Chemistry, Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Laboratory of Genetics; University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
37
|
Abstract
The cereals are of enormous importance to mankind. Many of the major cereal species - specifically, wheat, barley, oat, rye, and maize - have large genomes. Early cytogenetics, genome analysis and genetic mapping in the cereals benefited greatly from their large chromosomes, and the allopolyploidy of wheat and oats that has allowed for the development of many precise cytogenetic stocks. In the genomics era, however, large genomes are disadvantageous. Sequencing large and complex genomes is expensive, and the assembly of genome sequence is hampered by a significant content of repetitive DNA and, in allopolyploids, by the presence of homoeologous genomes. Dissection of the genome into its component chromosomes and chromosome arms provides an elegant solution to these problems. In this review we illustrate how this can be achieved by flow cytometric sorting. We describe the development of methods for the preparation of intact chromosome suspensions from the major cereals, and their analysis and sorting using flow cytometry. We explain how difficulties in the discrimination of specific chromosomes and their arms can be overcome by exploiting extant cytogenetic stocks of polyploid wheat and oats, in particular chromosome deletion and alien addition lines. Finally, we discuss some of the applications of flow-sorted chromosomes, and present some examples demonstrating that a chromosome-based approach is advantageous for the analysis of the complex genomes of cereals, and that it can offer significant potential for the delivery of genome sequencing and gene cloning in these crops.
Collapse
Affiliation(s)
- Jaroslav Dolezel
- Laboratory of Molecular Cytogenetics and Cytometry, Institute of Experimental Botany, Sokolovská 6, CZ-77200, Olomouc, Czech Republic.
| | | | | | | | | |
Collapse
|
38
|
Abstract
Understanding the behavior of DNA at the molecular level is of considerable fundamental and engineering importance. While adequate representations of DNA exist at the atomic and continuum level, there is a relative lack of models capable of describing the behavior of DNA at mesoscopic length scales. We present a mesoscale model of DNA that reduces the complexity of a nucleotide to three interactions sites, one each for the phosphate, sugar, and base, thereby rendering the investigation of DNA up to a few microns in length computationally tractable. The charges on these sites are considered explicitly. The model is parametrized using thermal denaturation experimental data at a fixed salt concentration. The validity of the model is established by its ability to predict several aspects of DNA behavior, including salt-dependent melting, bubble formation and rehybridization, and the mechanical properties of the molecule as a function of salt concentration.
Collapse
Affiliation(s)
- Thomas A Knotts
- Department of Chemical Engineering, Brigham Young University, Provo, Utah 84602, USA.
| | | | | | | |
Collapse
|
39
|
Wu T, Schwartz DC. Transchip: single-molecule detection of transcriptional elongation complexes. Anal Biochem 2006; 361:31-46. [PMID: 17187751 PMCID: PMC1945215 DOI: 10.1016/j.ab.2006.10.042] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 10/30/2006] [Accepted: 10/30/2006] [Indexed: 11/24/2022]
Abstract
A new single-molecule system, Transchip, was developed for analysis of transcription products at their genomic origins. The bacteriophage T7 RNA polymerase and its promoters were used in a model system, and resultant RNAs were imaged and detected at their positions along single template DNA molecules. The Transchip system has drawn from critical aspects of Optical Mapping, a single-molecule system that enables the construction of high-resolution ordered restriction maps of whole genomes from single DNA molecules. Through statistical analysis of hundreds of single-molecule template/transcript complexes, Transchip enables analysis of the locations and strength of promoters, the direction and processivity of transcription reactions, and the termination of transcription. These novel results suggest that the new system may serve as a high-throughput platform to investigate transcriptional events on a large genome-wide scale.
Collapse
Affiliation(s)
- Tian Wu
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | |
Collapse
|
40
|
Abstract
Recent developments in highly parallel genome-wide assays are transforming the study of human health and disease. High-resolution whole-genome association studies of complex diseases are finally being undertaken after much hypothesizing about their merit for finding disease loci. The availability of inexpensive high-density SNP-genotyping arrays has made this feasible. Cancer biology will also be transformed by high-resolution genomic and epigenomic analysis. In the future, most cancers might be staged by high-resolution molecular profiling rather than by gross cytological analysis. Here, we describe the key developments that enable highly parallel genomic assays.
Collapse
Affiliation(s)
- Jian-Bing Fan
- Illumina Inc., 9885 Towne Centre Drive, San Diego, California 92121, USA
| | | | | |
Collapse
|
41
|
Suchánková P, Kubaláková M, Kovárová P, Bartos J, Cíhalíková J, Molnár-Láng M, Endo TR, Dolezel J. Dissection of the nuclear genome of barley by chromosome flow sorting. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2006; 113:651-9. [PMID: 16810504 DOI: 10.1007/s00122-006-0329-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2006] [Accepted: 05/27/2006] [Indexed: 05/10/2023]
Abstract
Isolation of mitotic chromosomes using flow cytometry is an attractive way to dissect nuclear genomes into their individual chromosomal components or portions of them. This approach is especially useful in plants with complex genomes, where it offers a targeted and hence economical approach to genome analysis and gene cloning. In several plant species, DNA of flow-sorted chromosomes has been used for isolation of molecular markers from specific genome regions, for physical mapping using polymerase chain reaction (PCR) and fluorescence in situ hybridization (FISH), for integration of genetic and physical maps and for construction of chromosome-specific DNA libraries, including those cloned in bacterial artificial chromosome vectors. Until now, chromosome analysis and sorting using flow cytometry (flow cytogenetics) has found little application in barley (2n = 14, 1C approximately 5,100 Mbp) because of the impossibility of discriminating and sorting individual chromosomes, except for the smallest chromosome 1H and some translocation chromosomes with DNA content significantly different from the remaining chromosomes. In this work, we demonstrate that wheat-barley ditelosomic addition lines can be used to sort any arm of barley chromosomes 2H-7H. Thus, the barley genome can be dissected into fractions representing only about 6-12% of the total genome. This advance makes the flow cytogenetics an attractive tool, which may greatly facilitate genome analysis and gene cloning in barley.
Collapse
Affiliation(s)
- Pavla Suchánková
- Laboratory of Molecular Cytogenetics and Cytometry, Institute of Experimental Botany, Sokolovská 6, 77200 Olomouc, Czech Republic
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
Collapse
Affiliation(s)
- Lars Feuk
- The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, Department of Molecular and Medical Genetics, University of Toronto, Ontario, Canada
| | | | | |
Collapse
|
43
|
Paterson AH. Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat Rev Genet 2006; 7:174-84. [PMID: 16485017 DOI: 10.1038/nrg1806] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Crop plants not only have economic significance, but also comprise important botanical models for evolution and development. This is reflected by the recent increase in the percentage of publicly available sequence data that are derived from angiosperms. Further genome sequencing of the major crop plants will offer new learning opportunities, but their large, repetitive, and often polyploid genomes present challenges. Reduced-representation approaches - such as EST sequencing, methyl filtration and Cot-based cloning and sequencing - provide increased efficiency in extracting key information from crop genomes without full-genome sequencing. Combining these methods with phylogenetically stratified sampling to allow comparative genomic approaches has the potential to further accelerate progress in angiosperm genomics.
Collapse
Affiliation(s)
- Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602, USA.
| |
Collapse
|
44
|
Phillips KM, Larson JW, Yantz GR, D'Antoni CM, Gallo MV, Gillis KA, Goncalves NM, Neely LA, Gullans SR, Gilmanshin R. Application of single molecule technology to rapidly map long DNA and study the conformation of stretched DNA. Nucleic Acids Res 2005; 33:5829-37. [PMID: 16243782 PMCID: PMC1266062 DOI: 10.1093/nar/gki895] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Herein we describe the first application of direct linear analysis (DLA) to the mapping of a bacterial artificial chromosome (BAC), specifically the 185.1 kb-long BAC 12M9. DLA is a single molecule mapping technology, based on microfluidic elongation and interrogation of individual DNA molecules, sequence-specifically tagged with bisPNAs. A DNA map with S/N ratio sufficiently high to detect all major binding sites was obtained using only 200 molecule traces. A new method was developed to extract an oriented map from an averaged map that included a mixture of head-first and tail-first DNA traces. In addition, we applied DLA to study the conformation and tagging of highly stretched DNA. Optimal conditions for promoting sequence-specific binding of bisPNA to an 8 bp target site were elucidated using DLA, which proved superior to electromobility shift assays. DLA was highly reproducible with a hybridized tag position localized with an accuracy of +/-0.7 microm or +/-2.1 kb demonstrating its utility for rapid mapping of large DNA at the single molecule level. Within this accuracy, DNA molecules, stretched to at least 85% of their contour length, were stretched uniformly, so that the map expressed in relative coordinates, was the same regardless of the molecule extension.
Collapse
Affiliation(s)
- Kevin M Phillips
- U. S. Genomics, Inc., 12 Gill Street, Suite 4700, Woburn, MA 01801, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Ferris MM, Yoshida TM, Marrone BL, Keller RA. Fingerprinting of single viral genomes. Anal Biochem 2005; 337:278-88. [PMID: 15691508 DOI: 10.1016/j.ab.2004.10.050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2004] [Indexed: 11/30/2022]
Abstract
We demonstrate the use of technology developed for optical mapping to acquire DNA fingerprints from single genomes for the purpose of discrimination and identification of bacteria and viruses. Single genome fingerprinting (SGF) provides not only the size but also the order of the restriction fragments, which adds another dimension to the information that can be used for discrimination. Analysis of single organisms may eliminate the need to culture cells and thereby significantly reduce analysis time. In addition, samples containing mixtures of several organisms can be analyzed. For analysis, cells are embedded in an agarose matrix, lysed, and processed to yield intact DNA. The DNA is then deposited on a derivatized glass substrate. The elongated genome is digested with a restriction enzyme and stained with the intercalating dye YOYO-1. DNA is then quantitatively imaged with a fluorescence microscope and the fragments are sized to an accuracy >or=90% by their fluorescence intensity and contour length. Single genome fingerprints were obtained from pure samples of adenovirus, from bacteriophages lambda and T4 GT7, and from a mixture of the three viral genomes. SGF will enable the fingerprinting of uncultured and unamplified samples and allow rapid identification of microorganisms with applications in forensics, medicine, public health, and environmental microbiology.
Collapse
Affiliation(s)
- Matthew M Ferris
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | | | | |
Collapse
|
46
|
Ferris MM, Habbersett RC, Wolinsky M, Jett JH, Yoshida TM, Keller RA. Statistics of single-molecule measurements: applications in flow-cytometry sizing of DNA fragments. Cytometry A 2005; 60:41-52. [PMID: 15229856 DOI: 10.1002/cyto.a.20000] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND The measurement of physical properties from single molecules has been demonstrated. However, the majority of single-molecule studies report values based on relatively large data sets (e.g., N > 50). While there are studies that report physical quantities based on small sample sets, there has not been a detailed statistical analysis relating sample size to the reliability of derived parameters. METHODS Monte Carlo simulations and multinomial analysis, dependent on quantifiable experimental parameters, were used to determine the minimum number of single-molecule measurements required to produce an accurate estimate of a population mean. Simulation results were applied to the fluorescence-based sizing of DNA fragments by ultrasensitive flow cytometry (FCM). RESULTS Our simulations show, for an analytical technique with a 10% CV, that the average of as few as five single-molecule measurements would provide a mean value within one SD of the population mean. Additional simulations determined the number of measurements required to obtain the desired number of replicates for each subpopulation within a mixture. Application of these results to flow cytometry data for lambda/HindIII and S. aureus Mu50/SmaI DNA digests produced accurate DNA fingerprints from as few as 98 single-molecule measurements. CONCLUSIONS A surprisingly small number of single-molecule measurements are required to obtain a mean measurement descriptive of a normally-distributed parent population.
Collapse
Affiliation(s)
- Matthew M Ferris
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | | | | | | | | | | |
Collapse
|
47
|
de Pablo JJ. Molecular and multiscale modeling in chemical engineering - current view and future perspectives. AIChE J 2005. [DOI: 10.1002/aic.10623] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
48
|
Zhou S, Kile A, Bechner M, Place M, Kvikstad E, Deng W, Wei J, Severin J, Runnheim R, Churas C, Forrest D, Dimalanta ET, Lamers C, Burland V, Blattner FR, Schwartz DC. Single-molecule approach to bacterial genomic comparisons via optical mapping. J Bacteriol 2004; 186:7773-82. [PMID: 15516592 PMCID: PMC524920 DOI: 10.1128/jb.186.22.7773-7782.2004] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Modern comparative genomics has been established, in part, by the sequencing and annotation of a broad range of microbial species. To gain further insights, new sequencing efforts are now dealing with the variety of strains or isolates that gives a species definition and range; however, this number vastly outstrips our ability to sequence them. Given the availability of a large number of microbial species, new whole genome approaches must be developed to fully leverage this information at the level of strain diversity that maximize discovery. Here, we describe how optical mapping, a single-molecule system, was used to identify and annotate chromosomal alterations between bacterial strains represented by several species. Since whole-genome optical maps are ordered restriction maps, sequenced strains of Shigella flexneri serotype 2a (2457T and 301), Yersinia pestis (CO 92 and KIM), and Escherichia coli were aligned as maps to identify regions of homology and to further characterize them as possible insertions, deletions, inversions, or translocations. Importantly, an unsequenced Shigella flexneri strain (serotype Y strain AMC[328Y]) was optically mapped and aligned with two sequenced ones to reveal one novel locus implicated in serotype conversion and several other loci containing insertion sequence elements or phage-related gene insertions. Our results suggest that genomic rearrangements and chromosomal breakpoints are readily identified and annotated against a prototypic sequenced strain by using the tools of optical mapping.
Collapse
Affiliation(s)
- Shiguo Zhou
- Laboratory for Molecular and Computation Genomics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Zhou S, Kile A, Kvikstad E, Bechner M, Severin J, Forrest D, Runnheim R, Churas C, Anantharaman TS, Myler P, Vogt C, Ivens A, Stuart K, Schwartz DC. Shotgun optical mapping of the entire Leishmania major Friedlin genome. Mol Biochem Parasitol 2004; 138:97-106. [PMID: 15500921 DOI: 10.1016/j.molbiopara.2004.08.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2004] [Accepted: 08/02/2004] [Indexed: 11/21/2022]
Abstract
Leishmania is a group of protozoan parasites which causes a broad spectrum of diseases resulting in widespread human suffering and death, as well as economic loss from the infection of some domestic animals and wildlife. To further understand the fundamental genomic architecture of this parasite, and to accelerate the on-going sequencing project, a whole-genome XbaI restriction map was constructed using the optical mapping system. This map supplemented traditional physical maps that were generated by fingerprinting and hybridization of cosmid and P1 clone libraries. Thirty-six optical map contigs were constructed for the corresponding known 36 chromosomes of the Leishmania major Friedlin genome. The chromosome sizes ranged from 326.9 to 2821.3 kb, with a total genome size of 34.7 Mb; the average XbaI restriction fragment was 25.3 kb, and ranged from 15.7 to 77.8 kb on a per chromosomes basis. Comparison between the optical maps and the in silico maps of sequence drawn from completed, nearly finished, or large sequence contigs showed that optical maps served several useful functions within the path to create finished sequence by: guiding aspects of the sequence assembly, identifying misassemblies, detection of cosmid or PAC clones misplacements to chromosomes, and validation of sequence stemming from varying degrees of finishing. Our results also showed the potential use of optical maps as a means to detect and characterize map segmental duplication within genomes.
Collapse
Affiliation(s)
- Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI 53706, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Chan EY, Goncalves NM, Haeusler RA, Hatch AJ, Larson JW, Maletta AM, Yantz GR, Carstea ED, Fuchs M, Wong GG, Gullans SR, Gilmanshin R. DNA mapping using microfluidic stretching and single-molecule detection of fluorescent site-specific tags. Genome Res 2004; 14:1137-46. [PMID: 15173119 PMCID: PMC419792 DOI: 10.1101/gr.1635204] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have developed a rapid molecular mapping technology--Direct Linear Analysis (DLA)--on the basis of the analysis of individual DNA molecules bound with sequence-specific fluorescent tags. The apparatus includes a microfluidic device for stretching DNA molecules in elongational flow that is coupled to a multicolor detection system capable of single-fluorophore sensitivity. Double-stranded DNA molecules were tagged at sequence-specific motif sites with fluorescent bisPNA (Peptide Nucleic Acid) tags. The DNA molecules were then stretched in the microfluidic device and driven in a flow stream past confocal fluorescence detectors. DLA provided the spatial locations of multiple specific sequence motifs along individual DNA molecules, and thousands of individual molecules could be analyzed per minute. We validated this technology using the 48.5 kb lambda phage genome with different 8-base and 7-base sequence motif tags. The distance between the sequence motifs was determined with an accuracy of +/-0.8 kb, and these tags could be localized on the DNA with an accuracy of +/-2 kb. Thus, DLA is a rapid mapping technology, suitable for analysis of long DNA molecules.
Collapse
Affiliation(s)
- Eugene Y Chan
- U.S. Genomics, Inc., Woburn, Massachusetts 01801, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|