Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Narzisi G, Mishra B. Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. ACTA ACUST UNITED AC 2010;27:153-60. [PMID: 21088026 DOI: 10.1093/bioinformatics/btq646] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Narzisi G, Mishra B. Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. ACTA ACUST UNITED AC 2010;27:153-60. [PMID: 21088026 DOI: 10.1093/bioinformatics/btq646] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Facilitated sequence counting and assembly by template mutagenesis. Proc Natl Acad Sci U S A 2014;111:E4632-7. [PMID: 25313059 DOI: 10.1073/pnas.1416204111] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Narzisi G, O'Rawe JA, Iossifov I, Fang H, Lee YH, Wang Z, Wu Y, Lyon GJ, Wigler M, Schatz MC. Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat Methods 2014;11:1033-6. [PMID: 25128977 PMCID: PMC4180789 DOI: 10.1038/nmeth.3069] [Citation(s) in RCA: 153] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 07/11/2014] [Indexed: 12/30/2022]

Levy-Sakin M, Grunwald A, Kim S, Gassman NR, Gottfried A, Antelman J, Kim Y, Ho S, Samuel R, Michalet X, Lin RR, Dertinger T, Kim AS, Chung S, Colyer RA, Weinhold E, Weiss S, Ebenstein Y. Toward single-molecule optical mapping of the epigenome. ACS NANO 2014;8:14-26. [PMID: 24328256 PMCID: PMC4022788 DOI: 10.1021/nn4050694] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Affiliation(s)

Michal Levy-Sakin Raymond and Beverly Sackler Faculty of Exact Sciences, School of Chemistry, Tel Aviv University, Tel Aviv, Israel
Assaf Grunwald Raymond and Beverly Sackler Faculty of Exact Sciences, School of Chemistry, Tel Aviv University, Tel Aviv, Israel
Soohong Kim Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Natalie R. Gassman Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Anna Gottfried Institute of Organic Chemistry, RWTH Aachen University, Aachen, Germany
Josh Antelman Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Younggyu Kim Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Sam Ho Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Robin Samuel Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Xavier Michalet Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Ron R. Lin Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Thomas Dertinger Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Andrew S. Kim Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Sangyoon Chung Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Ryan A. Colyer Department of Chemistry and Biochemistry, University of California, Los Angeles, USA
Elmar Weinhold Institute of Organic Chemistry, RWTH Aachen University, Aachen, Germany
Shimon Weiss Department of Chemistry and Biochemistry, University of California, Los Angeles, USA Corresponding authors: (Y. Ebenstein), (S. Weiss)
Yuval Ebenstein Raymond and Beverly Sackler Faculty of Exact Sciences, School of Chemistry, Tel Aviv University, Tel Aviv, Israel Corresponding authors: (Y. Ebenstein), (S. Weiss)

Collapse

Vezzi F, Narzisi G, Mishra B. Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons. PLoS One 2012;7:e52210. [PMID: 23284938 PMCID: PMC3532452 DOI: 10.1371/journal.pone.0052210] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Accepted: 11/16/2012] [Indexed: 11/19/2022] Open

Abstract

In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA sequences/reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. This situation has prompted the scientific community to rely on crowd-sourcing through international competitions, such as Assemblathons or GAGE, with the intention of identifying the best assembler(s) and their features. Somewhat circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed [Formula: see text], which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers - thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted from them.

Collapse

AGORA: Assembly Guided by Optical Restriction Alignment. BMC Bioinformatics 2012;13:189. [PMID: 22856673 PMCID: PMC3431216 DOI: 10.1186/1471-2105-13-189] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Accepted: 06/28/2012] [Indexed: 11/10/2022] Open

Abstract

Background

Genome assembly is difficult due to repeated sequences within the genome, which create ambiguities and cause the final assembly to be broken up into many separate sequences (contigs). Long range linking information, such as mate-pairs or mapping data, is necessary to help assembly software resolve repeats, thereby leading to a more complete reconstruction of genomes. Prior work has used optical maps for validating assemblies and scaffolding contigs, after an initial assembly has been produced. However, optical maps have not previously been used within the genome assembly process. Here, we use optical map information within the popular de Bruijn graph assembly paradigm to eliminate paths in the de Bruijn graph which are not consistent with the optical map and help determine the correct reconstruction of the genome.

Results

We developed a new algorithm called AGORA: Assembly Guided by Optical Restriction Alignment. AGORA is the first algorithm to use optical map information directly within the de Bruijn graph framework to help produce an accurate assembly of a genome that is consistent with the optical map information provided. Our simulations on bacterial genomes show that AGORA is effective at producing assemblies closely matching the reference sequences.

Additionally, we show that noise in the optical map can have a strong impact on the final assembly quality for some complex genomes, and we also measure how various characteristics of the starting de Bruijn graph may impact the quality of the final assembly. Lastly, we show that a proper choice of restriction enzyme for the optical map may substantially improve the quality of the final assembly.

Conclusions

Our work shows that optical maps can be used effectively to assemble genomes within the de Bruijn graph assembly framework. Our experiments also provide insights into the characteristics of the mapping data that most affect the performance of our algorithm, indicating the potential benefit of more accurate optical mapping technologies, such as nano-coding.

Collapse

Kim S, Gottfried A, Lin RR, Dertinger T, Kim AS, Chung S, Colyer RA, Weinhold E, Weiss S, Ebenstein Y. Enzymatically incorporated genomic tags for optical mapping of DNA-binding proteins. Angew Chem Int Ed Engl 2012;51:3578-81. [PMID: 22344826 DOI: 10.1002/anie.201107714] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Revised: 12/19/2011] [Indexed: 11/08/2022]

Kim S, Gottfried A, Lin RR, Dertinger T, Kim AS, Chung S, Colyer RA, Weinhold E, Weiss S, Ebenstein Y. Enzymatically Incorporated Genomic Tags for Optical Mapping of DNA-Binding Proteins. Angew Chem Int Ed Engl 2012. [DOI: 10.1002/ange.201107714] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Feature-by-feature--evaluating de novo sequence assembly. PLoS One 2012;7:e31002. [PMID: 22319599 PMCID: PMC3272011 DOI: 10.1371/journal.pone.0031002] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/29/2011] [Indexed: 01/31/2023] Open

Abstract

The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (i.e., assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power. Traditional methods rely on standard metrics and read simulation: while on the one hand, metrics like N50 and number of contigs focus only on size without proportionately emphasizing the information about the correctness of the assembly, comparisons performed on simulated dataset, on the other hand, can be highly biased by the non-realistic assumptions in the underlying read generator. Recently the Feature Response Curve (FRC) method was proposed to assess the overall assembly quality and correctness: FRC transparently captures the trade-offs between contigs' quality against their sizes. Nevertheless, the relationship among the different features and their relative importance remains unknown. In particular, FRC cannot account for the correlation among the different features. We analyzed the correlation among different features in order to better describe their relationships and their importance in gauging assembly quality and correctness. In particular, using multivariate techniques like principal and independent component analysis we were able to estimate the "excess-dimensionality" of the feature space. Moreover, principal component analysis allowed us to show how poorly the acclaimed N50 metric describes the assembly quality. Applying independent component analysis we identified a subset of features that better describe the assemblers performances. We demonstrated that by focusing on a reduced set of highly informative features we can use the FRC curve to better describe and compare the performances of different assemblers. Moreover, as a by-product of our analysis, we discovered how often evaluation based on simulated data, obtained with state of the art simulators, lead to not-so-realistic results.

Collapse

Schatz MC, Phillippy AM, Sommer DD, Delcher AL, Puiu D, Narzisi G, Salzberg SL, Pop M. Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform 2011;14:213-24. [PMID: 22199379 DOI: 10.1093/bib/bbr074] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Scholz MB, Lo CC, Chain PSG. Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr Opin Biotechnol 2011;23:9-15. [PMID: 22154470 DOI: 10.1016/j.copbio.2011.11.013] [Citation(s) in RCA: 194] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Revised: 11/09/2011] [Accepted: 11/10/2011] [Indexed: 12/24/2022]

Comparing de novo genome assembly: the long and short of it. PLoS One 2011;6:e19175. [PMID: 21559467 PMCID: PMC3084767 DOI: 10.1371/journal.pone.0019175] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 03/29/2011] [Indexed: 01/30/2023] Open

Abstract

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.

Collapse