Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang Y, Sun Y, Cole JR. A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data. PLoS Comput Biol 2014;10:e1003737. [PMID: 25122209 PMCID: PMC4133164 DOI: 10.1371/journal.pcbi.1003737] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 06/05/2014] [Indexed: 11/21/2022] Open

For:	Zhang Y, Sun Y, Cole JR. A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data. PLoS Comput Biol 2014;10:e1003737. [PMID: 25122209 PMCID: PMC4133164 DOI: 10.1371/journal.pcbi.1003737] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 06/05/2014] [Indexed: 11/21/2022] Open

Number

Cited by Other Article(s)

Ramanauskas K, Igić B. kakapo: easy extraction and annotation of genes from raw RNA-seq reads. PeerJ 2023;11:e16456. [PMID: 38034874 PMCID: PMC10688300 DOI: 10.7717/peerj.16456] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 10/23/2023] [Indexed: 12/02/2023] Open

Rather MA, Agarwal D, Bhat TA, Khan IA, Zafar I, Kumar S, Amin A, Sundaray JK, Qadri T. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture. Int J Biol Macromol 2023;233:123549. [PMID: 36740117 DOI: 10.1016/j.ijbiomac.2023.123549] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023]

Wafula EK, Zhang H, Von Kuster G, Leebens-Mack JH, Honaas LA, dePamphilis CW. PlantTribes2: Tools for comparative gene family analysis in plant genomics. FRONTIERS IN PLANT SCIENCE 2023;13:1011199. [PMID: 36798801 PMCID: PMC9928214 DOI: 10.3389/fpls.2022.1011199] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 12/02/2022] [Indexed: 05/12/2023]

Abstract

Plant genome-scale resources are being generated at an increasing rate as sequencing technologies continue to improve and raw data costs continue to fall; however, the cost of downstream analyses remains large. This has resulted in a considerable range of genome assembly and annotation qualities across plant genomes due to their varying sizes, complexity, and the technology used for the assembly and annotation. To effectively work across genomes, researchers increasingly rely on comparative genomic approaches that integrate across plant community resources and data types. Such efforts have aided the genome annotation process and yielded novel insights into the evolutionary history of genomes and gene families, including complex non-model organisms. The essential tools to achieve these insights rely on gene family analysis at a genome-scale, but they are not well integrated for rapid analysis of new data, and the learning curve can be steep. Here we present PlantTribes2, a scalable, easily accessible, highly customizable, and broadly applicable gene family analysis framework with multiple entry points including user provided data. It uses objective classifications of annotated protein sequences from existing, high-quality plant genomes for comparative and evolutionary studies. PlantTribes2 can improve transcript models and then sort them, either genome-scale annotations or individual gene coding sequences, into pre-computed orthologous gene family clusters with rich functional annotation information. Then, for gene families of interest, PlantTribes2 performs downstream analyses and customizable visualizations including, (1) multiple sequence alignment, (2) gene family phylogeny, (3) estimation of synonymous and non-synonymous substitution rates among homologous sequences, and (4) inference of large-scale duplication events. We give examples of PlantTribes2 applications in functional genomic studies of economically important plant families, namely transcriptomics in the weedy Orobanchaceae and a core orthogroup analysis (CROG) in Rosaceae. PlantTribes2 is freely available for use within the main public Galaxy instance and can be downloaded from GitHub or Bioconda. Importantly, PlantTribes2 can be readily adapted for use with genomic and transcriptomic data from any kind of organism.

Collapse

Tu M, Zeng J, Zhang J, Fan G, Song G. Unleashing the power within short-read RNA-seq for plant research: Beyond differential expression analysis and toward regulomics. FRONTIERS IN PLANT SCIENCE 2022;13:1038109. [PMID: 36570898 PMCID: PMC9773216 DOI: 10.3389/fpls.2022.1038109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]

Dufault‐Thompson K, Jiang X. Applications of de Bruijn graphs in microbiome research. IMETA 2022;1:e4. [PMID: 38867733 PMCID: PMC10989854 DOI: 10.1002/imt2.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 06/14/2024]

Tadmor AD, Phillips R. MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study. Bioinformatics 2022;38:631-647. [PMID: 34636854 PMCID: PMC10060711 DOI: 10.1093/bioinformatics/btab703] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 10/03/2021] [Accepted: 10/07/2021] [Indexed: 02/03/2023] Open

Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021;22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]

Lactation Associated Genes Revealed in Holstein Dairy Cows by Weighted Gene Co-Expression Network Analysis (WGCNA). Animals (Basel) 2021;11:ani11020314. [PMID: 33513831 PMCID: PMC7911360 DOI: 10.3390/ani11020314] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 01/23/2021] [Indexed: 02/07/2023] Open

Abstract

Simple Summary

Weighted gene coexpression network analysis (WGCNA) is a novel approach that can quickly analyze the relationships between genes and traits. In the past few years, studies on the gene expression changes of dairy cow mammary glands were only based on transcriptome comparisons between two lactation stages. Few studies focused on the relationships between gene expression of the dairy mammary gland and lactation stage or milk composition in a lactation cycle. In this study, we detected milk yield and composition in a lactation cycle. For the first time, we constructed a gene coexpression network using WGCNA on the basis of 18 gene expression profiles during six stages of a lactation cycle by transcriptome sequencing, generating 10 specific modules. Genes in each module were performed with gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Module–trait relationship analysis showed a series of potential candidates related to milk yield and composition. The current study provides an important theoretical basis for the further molecular breeding of dairy cows.

Abstract

Weighted gene coexpression network analysis (WGCNA) is a novel approach that can quickly analyze the relationships between genes and traits. In this study, the milk yield, lactose, fat, and protein of Holstein dairy cows were detected in a lactation cycle. Meanwhile, a total of 18 gene expression profiles were detected using mammary glands from six lactation stages (day 7 to calving, −7 d; day 30 post-calving, 30 d; day 90 post-calving, 90 d; day 180 post-calving, 180 d; day 270 post-calving, 270 d; day 315 post-calving, 315 d). On the basis of the 18 profiles, WGCNA identified for the first time 10 significant modules that may be related to lactation stage, milk yield, and the main milk composition content. Genes in the 10 significant modules were examined with gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The results revealed that the galactose metabolism pathway was a potential candidate for milk yield and milk lactose synthesis. In −7 d, ion transportation was more frequent and cell proliferation related terms became active. In late lactation, the suppressor of cytokine signaling 3 (SOCS3) might play a role in apoptosis. The sphingolipid signaling pathway was a potential candidate for milk fat synthesis. Dairy cows at 315 d were in a period of cell proliferation. Another notable phenomenon was that nonlactating dairy cows had a more regular circadian rhythm after a cycle of lactation. The results provide an important theoretical basis for the further molecular breeding of dairy cows.

Collapse

Schneijderberg M, Cheng X, Franken C, de Hollander M, van Velzen R, Schmitz L, Heinen R, Geurts R, van der Putten WH, Bezemer TM, Bisseling T. Quantitative comparison between the rhizosphere effect of Arabidopsis thaliana and co-occurring plant species with a longer life history. ISME JOURNAL 2020;14:2433-2448. [PMID: 32641729 DOI: 10.1038/s41396-020-0695-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 05/14/2020] [Accepted: 05/28/2020] [Indexed: 12/26/2022]

Affiliation(s)

Martinus Schneijderberg Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
Xu Cheng Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.
Carolien Franken Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
Mattias de Hollander Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, 6708 PB, Wageningen, The Netherlands
Robin van Velzen Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
Lucas Schmitz Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
Robin Heinen Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, 6708 PB, Wageningen, The Netherlands
Rene Geurts Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
Wim H van der Putten Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, 6708 PB, Wageningen, The Netherlands.,Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
T Martijn Bezemer Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, 6708 PB, Wageningen, The Netherlands.,Institute of Biology, Section Plant Ecology and Phytochemistry, Leiden University, P.O. Box 9505, 2300 RA, Leiden, The Netherlands
Ton Bisseling Department of Plant Sciences, Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.

Collapse

David L, Vicedomini R, Richard H, Carbone A. Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A. Bioinformatics 2020;36:3975-3981. [PMID: 32330240 PMCID: PMC7332565 DOI: 10.1093/bioinformatics/btaa272] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 04/11/2020] [Accepted: 04/17/2020] [Indexed: 11/13/2022] Open

Hofreiter M, Hartmann S. Reconstructing protein-coding sequences from ancient DNA. Methods Enzymol 2020;642:21-33. [DOI: 10.1016/bs.mie.2020.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Guo J, Quensen JF, Sun Y, Wang Q, Brown CT, Cole JR, Tiedje JM. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. Front Genet 2019;10:957. [PMID: 31749830 PMCID: PMC6843070 DOI: 10.3389/fgene.2019.00957] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 09/09/2019] [Indexed: 12/28/2022] Open

Abstract

Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.

Collapse

Gardner PP, Watson RJ, Morgan XC, Draper JL, Finn RD, Morales SE, Stott MB. Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies. PeerJ 2019;7:e6160. [PMID: 30631651 PMCID: PMC6322486 DOI: 10.7717/peerj.6160] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 11/14/2018] [Indexed: 01/26/2023] Open

Mitra S. Multiple Data Analyses and Statistical Approaches for Analyzing Data from Metagenomic Studies and Clinical Trials. Methods Mol Biol 2019;1910:605-634. [PMID: 31278679 DOI: 10.1007/978-1-4939-9074-0_20] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Bengtsson-Palme J, Larsson DGJ, Kristiansson E. Using metagenomics to investigate human and environmental resistomes. J Antimicrob Chemother 2018;72:2690-2703. [PMID: 28673041 DOI: 10.1093/jac/dkx199] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

A sensitive short read homology search tool for paired-end read sequencing data. BMC Bioinformatics 2017;18:414. [PMID: 29072140 PMCID: PMC5657049 DOI: 10.1186/s12859-017-1826-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

Homology search is still a significant step in functional analysis for genomic data. Profile Hidden Markov Model-based homology search has been widely used in protein domain analysis in many different species. In particular, with the fast accumulation of transcriptomic data of non-model species and metagenomic data, profile homology search is widely adopted in integrated pipelines for functional analysis. While the state-of-the-art tool HMMER has achieved high sensitivity and accuracy in domain annotation, the sensitivity of HMMER on short reads declines rapidly. The low sensitivity on short read homology search can lead to inaccurate domain composition and abundance computation. Our experimental results showed that half of the reads were missed by HMMER for a RNA-Seq dataset. Thus, there is a need for better methods to improve the homology search performance for short reads.

Results

We introduce a profile homology search tool named Short-Pair that is designed for short paired-end reads. By using an approximate Bayesian approach employing distribution of fragment lengths and alignment scores, Short-Pair can retrieve the missing end and determine true domains. In particular, Short-Pair increases the accuracy in aligning short reads that are part of remote homologs. We applied Short-Pair to a RNA-Seq dataset and a metagenomic dataset and quantified its sensitivity and accuracy on homology search. The experimental results show that Short-Pair can achieve better overall performance than the state-of-the-art methodology of profile homology search.

Conclusions

Short-Pair is best used for next-generation sequencing (NGS) data that lack reference genomes. It provides a complementary paired-end read homology search tool to HMMER. The source code is freely available at https://sourceforge.net/projects/short-pair/.

Collapse

Li D, Huang Y, Leung CM, Luo R, Ting HF, Lam TW. MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs. BMC Bioinformatics 2017;18:408. [PMID: 29072142 PMCID: PMC5657035 DOI: 10.1186/s12859-017-1825-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Abstract

Background

The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of de Bruijn graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 k-mer size for graph construction and whatever choice of k will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of de Bruijn graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of k-mers, which would have been an effective way to differentiate between erroneous k-mers and correct k-mers.

Results

In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative de Bruijn graphs to take full advantage of multiple k-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct de Bruijn graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a de Bruijn graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of k-mers for building better HMM model.

We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7–19.3% more contigs than Xander, and these contigs were assigned to 10–25% more gene references. In our experiments, MegaGTA, depending on the number of k-mers used, is two to ten times faster than Xander.

Conclusion

MegaGTA improves on the algorithm of Xander and achieves higher sensitivity, accuracy and speed. Moreover, it is capable of assembling gene sequences from ultra-large metagenomic datasets. Its source code is freely available at https://github.com/HKU-BAL/megagta .

Collapse

Gregor I, Schönhuth A, McHardy AC. Snowball: strain aware gene assembly of metagenomes. Bioinformatics 2017;32:i649-i657. [PMID: 27587685 DOI: 10.1093/bioinformatics/btw426] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Alves JMP, de Oliveira AL, Sandberg TOM, Moreno-Gallego JL, de Toledo MAF, de Moura EMM, Oliveira LS, Durham AM, Mehnert DU, Zanotto PMDA, Reyes A, Gruber A. GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data. Front Microbiol 2016;7:269. [PMID: 26973638 PMCID: PMC4777721 DOI: 10.3389/fmicb.2016.00269] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 02/19/2016] [Indexed: 01/01/2023] Open

Abstract

This work reports the development of GenSeed-HMM, a program that implements seed-driven progressive assembly, an approach to reconstruct specific sequences from unassembled data, starting from short nucleotide or protein seed sequences or profile Hidden Markov Models (HMM). The program can use any one of a number of sequence assemblers. Assembly is performed in multiple steps and relatively few reads are used in each cycle, consequently the program demands low computational resources. As a proof-of-concept and to demonstrate the power of HMM-driven progressive assemblies, GenSeed-HMM was applied to metagenomic datasets in the search for diverse ssDNA bacteriophages from the recently described Alpavirinae subfamily. Profile HMMs were built using Alpavirinae-specific regions from multiple sequence alignments (MSA) using either the viral protein 1 (VP1; major capsid protein) or VP4 (genome replication initiation protein). These profile HMMs were used by GenSeed-HMM (running Newbler assembler) as seeds to reconstruct viral genomes from sequencing datasets of human fecal samples. All contigs obtained were annotated and taxonomically classified using similarity searches and phylogenetic analyses. The most specific profile HMM seed enabled the reconstruction of 45 partial or complete Alpavirinae genomic sequences. A comparison with conventional (global) assembly of the same original dataset, using Newbler in a standalone execution, revealed that GenSeed-HMM outperformed global genomic assembly in several metrics employed. This approach is capable of detecting organisms that have not been used in the construction of the profile HMM, which opens up the possibility of diagnosing novel viruses, without previous specific information, constituting a de novo diagnosis. Additional applications include, but are not limited to, the specific assembly of extrachromosomal elements such as plastid and mitochondrial genomes from metagenomic data. Profile HMM seeds can also be used to reconstruct specific protein coding genes for gene diversity studies, and to determine all possible gene variants present in a metagenomic sample. Such surveys could be useful to detect the emergence of drug-resistance variants in sensitive environments such as hospitals and animal production facilities, where antibiotics are regularly used. Finally, GenSeed-HMM can be used as an adjunct for gap closure on assembly finishing projects, by using multiple contig ends as anchored seeds.

Collapse

Yuan C, Lei J, Cole J, Sun Y. Reconstructing 16S rRNA genes in metagenomic data. Bioinformatics 2015;31:i35-43. [PMID: 26072503 PMCID: PMC4765874 DOI: 10.1093/bioinformatics/btv231] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Achawanantakun R, Chen J, Sun Y, Zhang Y. LncRNA-ID: Long non-coding RNA IDentification using balanced random forests. Bioinformatics 2015;31:3897-905. [PMID: 26315901 DOI: 10.1093/bioinformatics/btv480] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 08/07/2015] [Indexed: 02/06/2023] Open

Sim M, Kim J. Metagenome assembly through clustering of next-generation sequencing data using protein sequences. J Microbiol Methods 2015;109:180-7. [PMID: 25572018 DOI: 10.1016/j.mimet.2015.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Revised: 01/03/2015] [Accepted: 01/03/2015] [Indexed: 11/16/2022]

Wang Q, Fish JA, Gilman M, Sun Y, Brown CT, Tiedje JM, Cole JR. Xander: employing a novel method for efficient gene-targeted metagenomic assembly. MICROBIOME 2015;3:32. [PMID: 26246894 PMCID: PMC4526283 DOI: 10.1186/s40168-015-0093-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 07/03/2015] [Indexed: 05/18/2023]