Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wei ZG, Zhang SW, Zhang YZ. DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences. Mol Inform 2017;36. [PMID: 28586119 DOI: 10.1002/minf.201600059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 04/25/2017] [Indexed: 11/08/2022]

For:	Wei ZG, Zhang SW, Zhang YZ. DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences. Mol Inform 2017;36. [PMID: 28586119 DOI: 10.1002/minf.201600059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 04/25/2017] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Wei ZG, Bu PY, Zhang XD, Liu F, Qian Y, Wu FX. invMap: a sensitive mapping tool for long noisy reads with inversion structural variants. Bioinformatics 2023;39:btad726. [PMID: 38058196 PMCID: PMC11320709 DOI: 10.1093/bioinformatics/btad726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 12/08/2023] Open

Cao M, Peng Q, Wei ZG, Liu F, Hou YF. EdClust: A heuristic sequence clustering method with higher sensitivity. J Bioinform Comput Biol 2021;20:2150036. [PMID: 34939905 DOI: 10.1142/s0219720021500360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Wei ZG, Zhang XD, Cao M, Liu F, Qian Y, Zhang SW. Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences. Front Microbiol 2021;12:644012. [PMID: 33841367 PMCID: PMC8024490 DOI: 10.3389/fmicb.2021.644012] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open

Wei ZG, Zhang SW, Liu F. smsMap: mapping single molecule sequencing reads by locating the alignment starting positions. BMC Bioinformatics 2020;21:341. [PMID: 32753028 PMCID: PMC7430848 DOI: 10.1186/s12859-020-03698-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 07/23/2020] [Indexed: 01/09/2023] Open

Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020;171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.

Collapse

Wei ZG, Zhang SW. DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs. Front Microbiol 2019;10:428. [PMID: 30915052 PMCID: PMC6422886 DOI: 10.3389/fmicb.2019.00428] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 02/19/2019] [Indexed: 12/30/2022] Open

Abstract

Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is often the first step for many downstream analyses. Heuristic clustering is one of the most widely employed approaches for generating OTUs. However, most heuristic OTUs clustering methods just select one single seed sequence to represent each cluster, resulting in their outcomes suffer from either overestimation of OTUs number or sensitivity to sequencing errors. In this paper, we present a novel dynamic multi-seeds clustering method (namely DMSC) to pick OTUs. DMSC first heuristically generates clusters according to the distance threshold. When the size of a cluster reaches the pre-defined minimum size, then DMSC selects the multi-core sequences (MCS) as the seeds that are defined as the n-core sequences (n ≥ 3), in which the distance between any two sequences is less than the distance threshold. A new sequence is assigned to the corresponding cluster depending on the average distance to MCS and the distance standard deviation within the MCS. If a new sequence is added to the cluster, dynamically update the MCS until no sequence is merged into the cluster. The new method DMSC was tested on several simulated and real-life sequence datasets and also compared with the traditional heuristic methods such as CD-HIT, UCLUST, and DBH. Experimental results in terms of the inferred OTUs number, normalized mutual information (NMI) and Matthew correlation coefficient (MCC) metrics demonstrate that DMSC can produce higher quality clusters with low memory usage and reduce OTU overestimation. Additionally, DMSC is also robust to the sequencing errors. The DMSC software can be freely downloaded from https://github.com/NWPU-903PR/DMSC.

Collapse

Marchet C, Lecompte L, Silva CD, Cruaud C, Aury JM, Nicolas J, Peterlongo P. De novo clustering of long reads by gene from transcriptomics data. Nucleic Acids Res 2019;47:e2. [PMID: 30260405 PMCID: PMC6326815 DOI: 10.1093/nar/gky834] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 09/04/2018] [Accepted: 09/10/2018] [Indexed: 02/07/2023] Open

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model. BMC Bioinformatics 2018;19:177. [PMID: 29788930 PMCID: PMC5964698 DOI: 10.1186/s12859-018-2208-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 05/15/2018] [Indexed: 12/29/2022] Open

Abstract

Background

PacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of subsequent analysis tools. Although there are several available simulators (e.g., PBSIM, SimLoRD and FASTQSim) that target the specific generation of PacBio libraries, the error rate of simulated sequences is not well matched to the quality value of raw PacBio datasets, especially for PacBio’s continuous long reads (CLR).

Results

By analyzing the characteristic features of CLR data from PacBio SMRT (single molecule real time) sequencing, we developed a new PacBio sequencing simulator (called NPBSS) for producing CLR reads. NPBSS simulator firstly samples the read sequences according to the read length logarithmic normal distribution, and choses different base quality values with different proportions. Then, NPBSS computes the overall error probability of each base in the read sequence with an empirical model, and calculates the deletion, substitution and insertion probabilities with the overall error probability to generate the PacBio CLR reads. Alignment results demonstrate that NPBSS fits the error rate of the PacBio CLR reads better than PBSIM and FASTQSim. In addition, the assembly results also show that simulated sequences of NPBSS are more like real PacBio CLR data.

Conclusion

NPBSS simulator is convenient to use with efficient computation and flexible parameters setting. Its generating PacBio CLR reads are more like real PacBio datasets.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2208-0) contains supplementary material, which is available to authorized users.

Collapse

Humphries A, Daud A. The gut microbiota and immune checkpoint inhibitors. Hum Vaccin Immunother 2018;14:2178-2182. [PMID: 29494275 PMCID: PMC6183319 DOI: 10.1080/21645515.2018.1442970] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open