Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Koren S, Treangen TJ, Pop M. Bambus 2: scaffolding metagenomes. ACTA ACUST UNITED AC 2011;27:2964-71. [PMID: 21926123 DOI: 10.1093/bioinformatics/btr520] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Number

Cited by Other Article(s)

Au EH, Weaver S, Katikaneni A, Wucherpfennig JI, Luo Y, Mangan RJ, Wund MA, Bell MA, Lowe CB. Genome Sequence of a Marine Threespine Stickleback (Gasterosteus aculeatus) from Rabbit Slough in the Cook Inlet. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.06.636934. [PMID: 39975098 PMCID: PMC11839064 DOI: 10.1101/2025.02.06.636934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]

Abstract

The Threespine Stickleback, Gasterosteus aculeatus, is an emerging model system for understanding the genomic basis of vertebrate adaptation. A strength of the system is that marine populations have repeatedly colonized freshwater environments, serving as natural biological replicates. These replicates have enabled researchers to efficiently identify phenotypes and genotypes under selection during this transition. While this repeated adaptation to freshwater has occurred throughout the northern hemisphere, the Cook Inlet in south-central Alaska has been an area of focus. The freshwater lakes in this area are being studied extensively and there is a high-quality freshwater reference assembly from a population in the region, Bear Paw Lake. Using a freshwater reference assembly is a potential limitation because genomic segments are repeatedly lost during freshwater adaptation. This scenario results in some of the key regions associated with marine-freshwater divergence being absent from freshwater genomes, and therefore absent from the reference assemblies. It may also be that isolated freshwater populations are more genetically diverged, potentially increasing reference biases. Here we present a highly-continuous marine assembly from Rabbit Slough in the Cook Inlet. All contigs are from long-read sequencing and have been ordered and oriented with Hi-C. The contigs are anchored to chromosomes and form a 454 Mbp assembly with an N50 of 1.3 Mbp, an L50 of 95, and a BUSCO score over 97%. The organization of the chromosomes in this marine individual is similar to existing freshwater assemblies, but with important structural differences, including the 3 previously known inversions that repeatedly separate marine and freshwater ecotypes. We anticipate that this high-quality marine assembly will more accurately reflect the ancestral population that founded the freshwater lakes in the area and will more closely match most other populations from around the world. This marine assembly, which includes the repeatedly deleted segments and offers a closer reference sequence for most populations, will enable more comprehensive and accurate computational and functional genomic investigations of Threespine Stickleback evolution.

Collapse

Azizpour A, Balaji A, Treangen TJ, Segarra S. Graph-based self-supervised learning for repeat detection in metagenomic assembly. Genome Res 2024;34:1468-1476. [PMID: 39029947 PMCID: PMC11529840 DOI: 10.1101/gr.279136.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 07/15/2024] [Indexed: 07/21/2024]

Rocha U, Coelho Kasmanas J, Kallies R, Saraiva JP, Toscan RB, Štefanič P, Bicalho MF, Borim Correa F, Baştürk MN, Fousekis E, Viana Barbosa LM, Plewka J, Probst AJ, Baldrian P, Stadler PF. MuDoGeR: Multi-Domain Genome recovery from metagenomes made easy. Mol Ecol Resour 2024;24:e13904. [PMID: 37994269 DOI: 10.1111/1755-0998.13904] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/18/2023] [Accepted: 11/13/2023] [Indexed: 11/24/2023]

Affiliation(s)

Ulisses Rocha Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Jonas Coelho Kasmanas Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
René Kallies Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Joao Pedro Saraiva Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Rodolfo Brizola Toscan Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Polonca Štefanič Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
Marcos Fleming Bicalho Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Felipe Borim Correa Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Merve Nida Baştürk Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Efthymios Fousekis Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Luiz Miguel Viana Barbosa Department of Environmental Microbiology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
Julia Plewka Environmental Microbiology and Biotechnology, Department of Chemistry, University of Duisburg-Essen, Essen, Germany
Alexander J Probst Environmental Microbiology and Biotechnology, Department of Chemistry, University of Duisburg-Essen, Essen, Germany
Petr Baldrian Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha 4, Czech Republic
Peter F Stadler Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria The Santa Fe Institute, Santa Fe, New Mexico, USA

Collapse

Sapoval N, Tanevski M, Treangen TJ. KombOver: Efficient k-core and K-truss based characterization of perturbations within the human gut microbiome. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024;29:506-520. [PMID: 38160303 PMCID: PMC10764071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]

Samantray D, Tanwar AS, Murali TS, Brand A, Satyamoorthy K, Paul B. A Comprehensive Bioinformatics Resource Guide for Genome-Based Antimicrobial Resistance Studies. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023;27:445-460. [PMID: 37861712 DOI: 10.1089/omi.2023.0140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]

Guo R, Zhang Z, He T, Li M, Zhuo Y, Yang X, Fan H, Chen X. Isolation and Identification of a New Isolate of Anguillid Herpesvirus 1 from Farmed American Eels (Anguilla rostrata) in China. Viruses 2022;14:2722. [PMID: 36560731 PMCID: PMC9784739 DOI: 10.3390/v14122722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/02/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022] Open

Kukkar D, Sharma PK, Kim KH. Recent advances in metagenomic analysis of different ecological niches for enhanced biodegradation of recalcitrant lignocellulosic biomass. ENVIRONMENTAL RESEARCH 2022;215:114369. [PMID: 36165858 DOI: 10.1016/j.envres.2022.114369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 06/16/2023]

That LFLN, Xu B, Pandohee J. Could foodomics hold the key to unlocking the role of prebiotics in gut microbiota and immunity? Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Balaji A, Sapoval N, Seto C, Leo Elworth R, Fu Y, Nute MG, Savidge T, Segarra S, Treangen TJ. KOMB: K-core based de novo characterization of copy number variation in microbiomes. Comput Struct Biotechnol J 2022;20:3208-3222. [PMID: 35832621 PMCID: PMC9249589 DOI: 10.1016/j.csbj.2022.06.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open

MacDonald ML, Lee KH. EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality. BMC Bioinformatics 2021;22:570. [PMID: 34837948 PMCID: PMC8627028 DOI: 10.1186/s12859-021-04480-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 11/15/2021] [Indexed: 11/16/2022] Open

Abstract

Background

To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment.

Results

EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study.

Conclusions

EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04480-2.

Collapse

Rahman A, Pachter L. SWALO: scaffolding with assembly likelihood optimization. Nucleic Acids Res 2021;49:e117. [PMID: 34417615 PMCID: PMC8599790 DOI: 10.1093/nar/gkab717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/16/2021] [Accepted: 08/16/2021] [Indexed: 01/01/2023] Open

Balvert M, Luo X, Hauptfeld E, Schönhuth A, Dutilh BE. OGRE: Overlap Graph-based metagenomic Read clustEring. Bioinformatics 2021;37:905-912. [PMID: 32871010 PMCID: PMC8128468 DOI: 10.1093/bioinformatics/btaa760] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 08/19/2020] [Accepted: 08/25/2020] [Indexed: 11/13/2022] Open

Alipanahi B, Muggli MD, Jundi M, Noyes NR, Boucher C. Metagenome SNP calling via read-colored de Bruijn graphs. Bioinformatics 2021;36:5275-5281. [PMID: 32049324 DOI: 10.1093/bioinformatics/btaa081] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 01/08/2020] [Accepted: 02/03/2020] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Metagenomics refers to the study of complex samples containing of genetic contents of multiple individual organisms and, thus, has been used to elucidate the microbiome and resistome of a complex sample. The microbiome refers to all microbial organisms in a sample, and the resistome refers to all of the antimicrobial resistance (AMR) genes in pathogenic and non-pathogenic bacteria. Single-nucleotide polymorphisms (SNPs) can be effectively used to 'fingerprint' specific organisms and genes within the microbiome and resistome and trace their movement across various samples. However, to effectively use these SNPs for this traceability, a scalable and accurate metagenomics SNP caller is needed. Moreover, such an SNP caller should not be reliant on reference genomes since 95% of microbial species is unculturable, making the determination of a reference genome extremely challenging. In this article, we address this need.

RESULTS

We present LueVari, a reference-free SNP caller based on the read-colored de Bruijn graph, an extension of the traditional de Bruijn graph that allows repeated regions longer than the k-mer length and shorter than the read length to be identified unambiguously. LueVari is able to identify SNPs in both AMR genes and chromosomal DNA from shotgun metagenomics data with reliable sensitivity (between 91% and 99%) and precision (between 71% and 99%) as the performance of competing methods varies widely. Furthermore, we show that LueVari constructs sequences containing the variation, which span up to 97.8% of genes in datasets, which can be helpful in detecting distinct AMR genes in large metagenomic datasets.

AVAILABILITY AND IMPLEMENTATION

Code and datasets are publicly available at https://github.com/baharpan/cosmo/tree/LueVari.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Muralidharan HS, Shah N, Meisel JS, Pop M. Binnacle: Using Scaffolds to Improve the Contiguity and Quality of Metagenomic Bins. Front Microbiol 2021;12:638561. [PMID: 33717033 PMCID: PMC7945042 DOI: 10.3389/fmicb.2021.638561] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 02/04/2021] [Indexed: 01/03/2023] Open

Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. A comprehensive review of scaffolding methods in genome assembly. Brief Bioinform 2021;22:6149347. [PMID: 33634311 DOI: 10.1093/bib/bbab033] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open

Biological computation and computational biology: survey, challenges, and discussion. Artif Intell Rev 2021. [DOI: 10.1007/s10462-020-09951-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Hsieh MF, Lu CL, Tang CY. Clover: a clustering-oriented de novo assembler for Illumina sequences. BMC Bioinformatics 2020;21:528. [PMID: 33203354 PMCID: PMC7672897 DOI: 10.1186/s12859-020-03788-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 09/29/2020] [Indexed: 11/26/2022] Open

Kolmogorov M, Bickhart DM, Behsaz B, Gurevich A, Rayko M, Shin SB, Kuhn K, Yuan J, Polevikov E, Smith TPL, Pevzner PA. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 2020;17:1103-1110. [PMID: 33020656 PMCID: PMC10699202 DOI: 10.1038/s41592-020-00971-x] [Citation(s) in RCA: 466] [Impact Index Per Article: 93.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/22/2020] [Accepted: 09/07/2020] [Indexed: 02/06/2023]

Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 2020;30:1291-1305. [PMID: 32801147 PMCID: PMC7545148 DOI: 10.1101/gr.263566.120] [Citation(s) in RCA: 420] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]

Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform 2020;20:1140-1150. [PMID: 28968737 DOI: 10.1093/bib/bbx098] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 07/13/2017] [Indexed: 01/09/2023] Open

Dovrolis N, Kolios G, Spyrou GM, Maroulakou I. Computational profiling of the gut-brain axis: microflora dysbiosis insights to neurological disorders. Brief Bioinform 2020;20:825-841. [PMID: 29186317 DOI: 10.1093/bib/bbx154] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 12/14/2022] Open

Abstract

Almost 2500 years after Hippocrates' observations on health and its direct association to the gastrointestinal tract, a paradigm shift has recently occurred, making the gut and its symbionts (bacteria, fungi, archaea and viruses) a point of convergence for studies. It is nowadays well established that the gut microflora's compositional diversity regulates via its genes (the microbiome) the host's health and provides preliminary insights into disease progression and regulation. The microbiome's involvement is evident in immunological and physiological studies that link changes in its biodiversity to its contributions to the host's phenotype but also in neurological investigations, substantiating the aptly named gut-brain axis. The definitive mechanisms of this last bidirectional interaction will be our main focus because it presents researchers with a new conundrum. In this review, we prospect current literature for computational analysis methodologies that accommodate the need for better understanding of the microbiome-gut-brain interactions and neurological disorder onset and progression, through cross-disciplinary systems biology applications. We will present bioinformatics tools used in exploring these synergies that help build and interpret microbial 16S ribosomal RNA data sets, produced by shotgun and high-throughput sequencing of healthy and neurological disorder samples stored in biological databases. These approaches provide alternative means for researchers to form hypotheses to their inquests faster, cheaper and swith precision. The goal of these studies relies on the integration of combined metagenomics and metabolomics assessments. An accurate characterization of the microbiome and its functionality can support new diagnostic, prognostic and therapeutic strategies for neurological disorders, customized for each individual host.

Collapse

Pan W, Jiang T, Lonardi S. OMGS: Optical Map-Based Genome Scaffolding. J Comput Biol 2020;27:519-533. [DOI: 10.1089/cmb.2019.0310] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Ghurye J, Treangen T, Fedarko M, Hervey WJ, Pop M. MetaCarvel: linking assembly graph motifs to biological variants. Genome Biol 2019;20:174. [PMID: 31451112 PMCID: PMC6710874 DOI: 10.1186/s13059-019-1791-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 08/13/2019] [Indexed: 01/01/2023] Open

Kwon D, Lee J, Kim J. GMASS: a novel measure for genome assembly structural similarity. BMC Bioinformatics 2019;20:147. [PMID: 30885117 PMCID: PMC6423833 DOI: 10.1186/s12859-019-2710-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/03/2019] [Indexed: 01/10/2023] Open

Wu B, Li M, Liao X, Luo J, Wu F, Pan Y, Wang J. MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;17:847-857. [PMID: 30334805 DOI: 10.1109/tcbb.2018.2876855] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Progress of analytical tools and techniques for human gut microbiome research. J Microbiol 2018;56:693-705. [DOI: 10.1007/s12275-018-8238-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/07/2018] [Accepted: 06/08/2018] [Indexed: 12/15/2022]

SCOP: a novel scaffolding algorithm based on contig classification and optimization. Bioinformatics 2018;35:1142-1150. [DOI: 10.1093/bioinformatics/bty773] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 08/10/2018] [Accepted: 09/01/2018] [Indexed: 12/20/2022] Open

Li M, Tang L, Liao Z, Luo J, Wu F, Pan Y, Wang J. A novel scaffolding algorithm based on contig error correction and path extension. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;16:764-773. [PMID: 30040649 DOI: 10.1109/tcbb.2018.2858267] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Chen Q, Lan C, Zhao L, Wang J, Chen B, Chen YPP. Recent advances in sequence assembly: principles and applications. Brief Funct Genomics 2018;16:361-378. [PMID: 28453648 DOI: 10.1093/bfgp/elx006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Xu Y, Zhao F. Single-cell metagenomics: challenges and applications. Protein Cell 2018;9:501-510. [PMID: 29696589 PMCID: PMC5960468 DOI: 10.1007/s13238-018-0544-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 04/18/2018] [Indexed: 02/01/2023] Open

Obscura Acosta N, Mäkinen V, Tomescu AI. A safe and complete algorithm for metagenomic assembly. Algorithms Mol Biol 2018;13:3. [PMID: 29445416 PMCID: PMC5802251 DOI: 10.1186/s13015-018-0122-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 01/20/2018] [Indexed: 11/10/2022] Open

Abstract

Background

Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G.

Approach

We address this problem with the “safe and complete” framework of Tomescu and Medvedev (Research in computational Molecular biology—20th annual conference, RECOMB 9649:152–163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G.

Results

We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(m^2 + n^3)$$\end{document}O(m2+n3), and in the edge-covering case it runs in time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(m^2n)$$\end{document}O(m2n); n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.

Collapse

Human Microbiome Acquisition and Bioinformatic Challenges in Metagenomic Studies. Int J Mol Sci 2018;19:ijms19020383. [PMID: 29382070 PMCID: PMC5855605 DOI: 10.3390/ijms19020383] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 01/21/2018] [Accepted: 01/24/2018] [Indexed: 12/21/2022] Open

Aganezov SS, Alekseyev MA. CAMSA: a tool for comparative analysis and merging of scaffold assemblies. BMC Bioinformatics 2017;18:496. [PMID: 29244014 PMCID: PMC5731503 DOI: 10.1186/s12859-017-1919-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Abstract

BACKGROUND

Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually.

RESULTS

We present CAMSA-a tool for comparative analysis and merging of two or more given scaffold assemblies. The tool (i) creates an extensive report with several comparative quality metrics; (ii) constructs the most confident merged scaffold assembly; and (iii) provides an interactive framework for a visual comparative analysis of the given assemblies. Among the CAMSA features, only scaffold merging can be evaluated in comparison to existing methods. Namely, it resembles the functionality of assembly reconciliation tools, although their primary targets are somewhat different. Our evaluations show that CAMSA produces merged assemblies of comparable or better quality than existing assembly reconciliation tools while being the fastest in terms of the total running time.

CONCLUSIONS

CAMSA addresses the current deficiency of tools for automated comparison and analysis of multiple assemblies of the same set scaffolds. Since there exist numerous methods and techniques for scaffold assembly, identifying similarities and dissimilarities across assemblies produced by different methods is beneficial both for the developers of scaffold assembly algorithms and for the researchers focused on improving draft assemblies of specific organisms.

Collapse

Abante J, Ghaffari N, Johnson CD, Datta A. HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment. BMC Genomics 2017;18:694. [PMID: 28874136 PMCID: PMC5584555 DOI: 10.1186/s12864-017-3965-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 07/27/2017] [Indexed: 11/30/2022] Open

Abstract

Background

The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers.

Methods

Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology.

Results

Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources.

Conclusions

Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-3965-2) contains supplementary material, which is available to authorized users.

Collapse

Shi W, Ji P, Zhao F. The combination of direct and paired link graphs can boost repetitive genome assembly. Nucleic Acids Res 2017;45:e43. [PMID: 27924003 PMCID: PMC5399794 DOI: 10.1093/nar/gkw1191] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Accepted: 11/17/2016] [Indexed: 11/14/2022] Open

Kremer FS, McBride AJA, Pinto LDS. Approaches for in silico finishing of microbial genome sequences. Genet Mol Biol 2017;40:553-576. [PMID: 28898352 PMCID: PMC5596377 DOI: 10.1590/1678-4685-gmb-2016-0230] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/13/2017] [Indexed: 12/15/2022] Open

Halderman AA, Lane AP. Organism and Microbiome Analysis. Otolaryngol Clin North Am 2017;50:521-532. [DOI: 10.1016/j.otc.2017.01.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res 2017;27:824-834. [PMID: 28298430 PMCID: PMC5411777 DOI: 10.1101/gr.213959.116] [Citation(s) in RCA: 2482] [Impact Index Per Article: 310.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 03/13/2017] [Indexed: 01/25/2023]

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017;27:722-736. [PMID: 28298431 PMCID: PMC5411767 DOI: 10.1101/gr.215087.116] [Citation(s) in RCA: 4775] [Impact Index Per Article: 596.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 03/03/2017] [Indexed: 12/11/2022]

Roumpeka DD, Wallace RJ, Escalettes F, Fotheringham I, Watson M. A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data. Front Genet 2017;8:23. [PMID: 28321234 PMCID: PMC5337752 DOI: 10.3389/fgene.2017.00023] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 02/16/2017] [Indexed: 12/21/2022] Open

Ghurye JS, Cepeda-Espinoza V, Pop M. Metagenomic Assembly: Overview, Challenges and Applications. THE YALE JOURNAL OF BIOLOGY AND MEDICINE 2016;89:353-362. [PMID: 27698619 PMCID: PMC5045144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Luo J, Wang J, Zhang Z, Li M, Wu FX. BOSS: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics 2016;33:169-176. [DOI: 10.1093/bioinformatics/btw597] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 06/22/2016] [Accepted: 09/08/2016] [Indexed: 11/12/2022] Open

Kang DD, Rubin EM, Wang Z. Reconstructing single genomes from complex microbial communities. ACTA ACUST UNITED AC 2016. [DOI: 10.1515/itit-2016-0011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Shaik S, Kumar N, Lankapalli AK, Tiwari SK, Baddam R, Ahmed N. Contig-Layout-Authenticator (CLA): A Combinatorial Approach to Ordering and Scaffolding of Bacterial Contigs for Comparative Genomics and Molecular Epidemiology. PLoS One 2016;11:e0155459. [PMID: 27248146 PMCID: PMC4889084 DOI: 10.1371/journal.pone.0155459] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Accepted: 04/29/2016] [Indexed: 11/18/2022] Open

Abstract

A wide variety of genome sequencing platforms have emerged in the recent past. High-throughput platforms like Illumina and 454 are essentially adaptations of the shotgun approach generating millions of fragmented single or paired sequencing reads. To reconstruct whole genomes, the reads have to be assembled into contigs, which often require further downstream processing. The contigs can be directly ordered according to a reference, scaffolded based on paired read information, or assembled using a combination of the two approaches. While the reference-based approach appears to mask strain-specific information, scaffolding based on paired-end information suffers when repetitive elements longer than the size of the sequencing reads are present in the genome. Sequencing technologies that produce long reads can solve the problems associated with repetitive elements but are not necessarily easily available to researchers. The most common high-throughput technology currently used is the Illumina short read platform. To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds.

Collapse

Gupta A, Kumar S, Prasoodanan VPK, Harish K, Sharma AK, Sharma VK. Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes. Front Microbiol 2016;7:469. [PMID: 27148174 PMCID: PMC4828583 DOI: 10.3389/fmicb.2016.00469] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 03/21/2016] [Indexed: 11/13/2022] Open

Metagenomics: Retrospect and Prospects in High Throughput Age. BIOTECHNOLOGY RESEARCH INTERNATIONAL 2015;2015:121735. [PMID: 26664751 PMCID: PMC4664791 DOI: 10.1155/2015/121735] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 10/26/2015] [Indexed: 01/30/2023]

Weller M, Chateau A, Giroudeau R. Exact approaches for scaffolding. BMC Bioinformatics 2015;16 Suppl 14:S2. [PMID: 26451725 PMCID: PMC4603742 DOI: 10.1186/1471-2105-16-s14-s2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Anselmetti Y, Berry V, Chauve C, Chateau A, Tannier E, Bérard S. Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 2015;16 Suppl 10:S11. [PMID: 26450761 PMCID: PMC4603332 DOI: 10.1186/1471-2164-16-s10-s11] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Farrant GK, Hoebeke M, Partensky F, Andres G, Corre E, Garczarek L. WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data. BMC Bioinformatics 2015;16:281. [PMID: 26335184 PMCID: PMC4559175 DOI: 10.1186/s12859-015-0705-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 08/17/2015] [Indexed: 01/12/2023] Open

Abstract

Background

The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding.

Results

Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome.

Conclusion

Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users.

Collapse

Lai B, Wang F, Wang X, Duan L, Zhu H. InteMAP: Integrated metagenomic assembly pipeline for NGS short reads. BMC Bioinformatics 2015;16:244. [PMID: 26250558 PMCID: PMC4545859 DOI: 10.1186/s12859-015-0686-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 07/24/2015] [Indexed: 12/03/2022] Open

Abstract

Background

Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis but also raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. To date, how to generate a high-quality draft assembly for metagenomic sequencing projects has not been fully addressed.

Results

We conducted a comprehensive assessment on state-of-the-art de novo assemblers and revealed that the performance of each assembler depends critically on the sequencing depth. To address this problem, we developed a pipeline named InteMAP to integrate three assemblers, ABySS, IDBA-UD and CABOG, which were found to complement each other in assembling metagenomic sequences. Making a decision of which assembling approaches to use according to the sequencing coverage estimation algorithm for each short read, the pipeline presents an automatic platform suitable to assemble real metagenomic NGS data with uneven coverage distribution of sequencing depth. By comparing the performance of InteMAP with current assemblers on both synthetic and real NGS metagenomic data, we demonstrated that InteMAP achieves better performance with a longer total contig length and higher contiguity, and contains more genes than others.

Conclusions

We developed a de novo pipeline, named InteMAP, that integrates existing tools for metagenomics assembly. The pipeline outperforms previous assembly methods on metagenomic assembly by providing a longer total contig length, a higher contiguity and covering more genes. InteMAP, therefore, could potentially be a useful tool for the research community of metagenomics.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0686-x) contains supplementary material, which is available to authorized users.

Collapse