Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhou Y, Wang G, Zhang J, Li H. A Hypothesis Testing Based Method for Normalization and Differential Expression Analysis of RNA-Seq Data. PLoS One 2017;12:e0169594. [PMID: 28072846 PMCID: PMC5224994 DOI: 10.1371/journal.pone.0169594] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 12/18/2016] [Indexed: 12/22/2022] Open

For:	Zhou Y, Wang G, Zhang J, Li H. A Hypothesis Testing Based Method for Normalization and Differential Expression Analysis of RNA-Seq Data. PLoS One 2017;12:e0169594. [PMID: 28072846 PMCID: PMC5224994 DOI: 10.1371/journal.pone.0169594] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 12/18/2016] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Annotation depth confounds direct comparison of gene expression across species. BMC Bioinformatics 2021;22:499. [PMID: 34654362 PMCID: PMC8518172 DOI: 10.1186/s12859-021-04414-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 09/30/2021] [Indexed: 11/10/2022] Open

Zhou Y, Yang B, Wang J, Zhu J, Tian G. A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data. BMC Genomics 2021;22:479. [PMID: 34174824 PMCID: PMC8234728 DOI: 10.1186/s12864-021-07790-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 06/10/2021] [Indexed: 12/13/2022] Open

Abstract

Background

Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data.

Results

In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization.

Conclusions

Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB.

Supplementary Information

The online version contains supplementary material available at (10.1186/s12864-021-07790-0).

Collapse

Chowdhury HA, Bhattacharyya DK, Kalita JK. Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:566-586. [PMID: 30281477 DOI: 10.1109/tcbb.2018.2873010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Zhou Y, Wan X, Zhang B, Tong T. Classifying next-generation sequencing data using a zero-inflated Poisson model. Bioinformatics 2019;34:1329-1335. [PMID: 29186294 DOI: 10.1093/bioinformatics/btx768] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 11/24/2017] [Indexed: 11/14/2022] Open

Abstract

Motivation

With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros.

Results

In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors.

Availability and implementation

The software is available at http://www.math.hkbu.edu.hk/∼tongt.

Contact

xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

A statistical normalization method and differential expression analysis for RNA-seq data between different species. BMC Bioinformatics 2019;20:163. [PMID: 30925894 PMCID: PMC6441199 DOI: 10.1186/s12859-019-2745-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 03/18/2019] [Indexed: 02/06/2023] Open

Athanasiadou R, Neymotin B, Brandt N, Wang W, Christiaen L, Gresham D, Tranchina D. A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory. PLoS Comput Biol 2019;15:e1006794. [PMID: 30856174 PMCID: PMC6428340 DOI: 10.1371/journal.pcbi.1006794] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 03/21/2019] [Accepted: 01/16/2019] [Indexed: 01/09/2023] Open

Liu P, Yang X, Zhang H, Pu J, Wei K. Analysis of change in microRNA expression profiles of lung cancer A549 cells treated with Radix tetrastigma hemsleyani flavonoids. Onco Targets Ther 2018;11:4283-4300. [PMID: 30100735 PMCID: PMC6065472 DOI: 10.2147/ott.s164276] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Zhou Y, Wang J, Zhao Y, Tong T. Discriminant Analysis and Normalization Methods for Next-Generation Sequencing Data. NEW FRONTIERS OF BIOSTATISTICS AND BIOINFORMATICS 2018. [DOI: 10.1007/978-3-319-99389-8_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]