Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Frazee AC, Jaffe AE, Langmead B, Leek JT. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 2015;31:2778-84. [PMID: 25926345 DOI: 10.1093/bioinformatics/btv272] [Citation(s) in RCA: 195] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 04/18/2015] [Indexed: 12/26/2022] Open

For:	Frazee AC, Jaffe AE, Langmead B, Leek JT. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 2015;31:2778-84. [PMID: 25926345 DOI: 10.1093/bioinformatics/btv272] [Citation(s) in RCA: 195] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 04/18/2015] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

151

Hicks SC, Okrah K, Paulson JN, Quackenbush J, Irizarry RA, Bravo HC. Smooth quantile normalization. Biostatistics 2019;19:185-198. [PMID: 29036413 DOI: 10.1093/biostatistics/kxx028] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 05/07/2017] [Indexed: 11/14/2022] Open

Abstract

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

Collapse

152

Genome-wide identification of mRNA 5-methylcytosine in mammals. Nat Struct Mol Biol 2019;26:380-388. [PMID: 31061524 DOI: 10.1038/s41594-019-0218-x] [Citation(s) in RCA: 189] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 03/27/2019] [Indexed: 02/07/2023]

153

Pérez-Rubio P, Lottaz C, Engelmann JC. FastqPuri: high-performance preprocessing of RNA-seq data. BMC Bioinformatics 2019;20:226. [PMID: 31053060 PMCID: PMC6500068 DOI: 10.1186/s12859-019-2799-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/09/2019] [Indexed: 12/23/2022] Open

Abstract

Background

RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcript expression in high-throughput. While previously sequence alignment was a time demanding step, fast alignment methods and even more so transcript counting methods which avoid mapping and quantify gene and transcript expression by evaluating whether a read is compatible with a transcript, have led to significant speed-ups in data analysis. Now, the most time demanding step in the analysis of RNA-seq data is preprocessing the raw sequence data, such as running quality control and adapter, contamination and quality filtering before transcript or gene quantification. To do so, many researchers chain different tools, but a comprehensive, flexible and fast software that covers all preprocessing steps is currently missing.

Results

We here present FastqPuri, a light-weight and highly efficient preprocessing tool for fastq data. FastqPuri provides sequence quality reports on the sample and dataset level with new plots which facilitate decision making for subsequent quality filtering. Moreover, FastqPuri efficiently removes adapter sequences and sequences from biological contamination from the data. It accepts both single- and paired-end data in uncompressed or compressed fastq files. FastqPuri can be run stand-alone and is suitable to be run within pipelines. We benchmarked FastqPuri against existing tools and found that FastqPuri is superior in terms of speed, memory usage, versatility and comprehensiveness.

Conclusions

FastqPuri is a new tool which covers all aspects of short read sequence data preprocessing. It was designed for RNA-seq data to meet the needs for fast preprocessing of fastq data to allow transcript and gene counting, but it is suitable to process any short read sequencing data of which high sequence quality is needed, such as for genome assembly or SNV (single nucleotide variant) detection. FastqPuri is most flexible in filtering undesired biological sequences by offering two approaches to optimize speed and memory usage dependent on the total size of the potential contaminating sequences. FastqPuri is available at https://github.com/jengelmann/FastqPuri. It is implemented in C and R and licensed under GPL v3.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-2799-0) contains supplementary material, which is available to authorized users.

Collapse

154

Chung RH, Kang CY. A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification. Gigascience 2019;8:giz045. [PMID: 31029063 PMCID: PMC6486474 DOI: 10.1093/gigascience/giz045] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 03/05/2019] [Accepted: 03/28/2019] [Indexed: 01/16/2023] Open

Abstract

BACKGROUND

An integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics has become increasing popular for understanding the pathophysiology of complex diseases. Although many multi-omics analysis methods have been developed for complex disease studies, only a few simulation tools that simulate multiple types of omics data and model their relationships with disease status are available, and these tools have their limitations in simulating the multi-omics data.

RESULTS

We developed the multi-omics data simulator OmicsSIMLA, which simulates genomics (i.e., single-nucleotide polymorphisms [SNPs] and copy number variations), epigenomics (i.e., bisulphite sequencing), transcriptomics (i.e., RNA sequencing), and proteomics (i.e., normalized reverse phase protein array) data at the whole-genome level. Furthermore, the relationships between different types of omics data, such as methylation quantitative trait loci (SNPs influencing methylation), expression quantitative trait loci (SNPs influencing gene expression), and expression quantitative trait methylations (methylations influencing gene expression), were modeled. More importantly, the relationships between these multi-omics data and the disease status were modeled as well. We used OmicsSIMLA to simulate a multi-omics dataset for breast cancer under a hypothetical disease model and used the data to compare the performance among existing multi-omics analysis methods in terms of disease classification accuracy and runtime. We also used OmicsSIMLA to simulate a multi-omics dataset with a scale similar to an ovarian cancer multi-omics dataset. The neural network-based multi-omics analysis method ATHENA was applied to both the real and simulated data and the results were compared. Our results demonstrated that complex disease mechanisms can be simulated by OmicsSIMLA, and ATHENA showed the highest prediction accuracy when the effects of multi-omics features (e.g., SNPs, copy number variations, and gene expression levels) on the disease were strong. Furthermore, similar results can be obtained from ATHENA when analyzing the simulated and real ovarian multi-omics data.

CONCLUSIONS

OmicsSIMLA will be useful to evaluate the performace of different multi-omics analysis methods. Sample sizes and power can also be calculated by OmicsSIMLA when planning a new multi-omics disease study.

Collapse

155

Sherman TD, Kagohara LT, Cao R, Cheng R, Satriano M, Considine M, Krigsfeld G, Ranaweera R, Tang Y, Jablonski SA, Stein-O'Brien G, Gaykalova DA, Weiner LM, Chung CH, Fertig EJ. CancerInSilico: An R/Bioconductor package for combining mathematical and statistical modeling to simulate time course bulk and single cell gene expression data in cancer. PLoS Comput Biol 2019;14:e1006935. [PMID: 31002670 PMCID: PMC6504085 DOI: 10.1371/journal.pcbi.1006935] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 05/07/2019] [Accepted: 03/11/2019] [Indexed: 11/18/2022] Open

Affiliation(s)

Thomas D. Sherman Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America * E-mail: (TDS); (EJF)
Luciane T. Kagohara Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America
Raymon Cao Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America
Raymond Cheng Science, Math and Computer Science Magnet Program, Poolesville High School, Poolesville, MD United States of America
Matthew Satriano Department of Mathematics, University of Waterloo, Waterloo, Ontario, Canada
Michael Considine Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America
Gabriel Krigsfeld Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America
Ruchira Ranaweera Moffitt Cancer Center, Tampa, FL, United States of America
Yong Tang Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC United States of America
Sandra A. Jablonski Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC United States of America
Genevieve Stein-O'Brien Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD United States of America
Daria A. Gaykalova Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University School of Medicine, Baltimore, MD United States of America
Louis M. Weiner Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC United States of America
Christine H. Chung Moffitt Cancer Center, Tampa, FL, United States of America
Elana J. Fertig Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD United States of America Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD United States of America * E-mail: (TDS); (EJF)

Collapse

156

Aguiar VRC, César J, Delaneau O, Dermitzakis ET, Meyer D. Expression estimation and eQTL mapping for HLA genes with a personalized pipeline. PLoS Genet 2019;15:e1008091. [PMID: 31009447 PMCID: PMC6497317 DOI: 10.1371/journal.pgen.1008091] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 05/02/2019] [Accepted: 03/13/2019] [Indexed: 01/07/2023] Open

Abstract

The HLA (Human Leukocyte Antigens) genes are well-documented targets of balancing selection, and variation at these loci is associated with many disease phenotypes. Variation in expression levels also influences disease susceptibility and resistance, but little information exists about the regulation and population-level patterns of expression. This results from the difficulty in mapping short reads originated from these highly polymorphic loci, and in accounting for the existence of several paralogues. We developed a computational pipeline to accurately estimate expression for HLA genes based on RNA-seq, improving both locus-level and allele-level estimates. First, reads are aligned to all known HLA sequences in order to infer HLA genotypes, then quantification of expression is carried out using a personalized index. We use simulations to show that expression estimates obtained in this way are not biased due to divergence from the reference genome. We applied our pipeline to the GEUVADIS dataset, and compared the quantifications to those obtained with reference transcriptome. Although the personalized pipeline recovers more reads, we found that using the reference transcriptome produces estimates similar to the personalized pipeline (r ≥ 0.87) with the exception of HLA-DQA1. We describe the impact of the HLA-personalized approach on downstream analyses for nine classical HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). Although the influence of the HLA-personalized approach is modest for eQTL mapping, the p-values and the causality of the eQTLs obtained are better than when the reference transcriptome is used. We investigate how the eQTLs we identified explain variation in expression among lineages of HLA alleles. Finally, we discuss possible causes underlying differences between expression estimates obtained using RNA-seq, antibody-based approaches and qPCR.

Collapse

157

Owen N, Moosajee M. RNA-sequencing in ophthalmology research: considerations for experimental design and analysis. Ther Adv Ophthalmol 2019;11:2515841419835460. [PMID: 30911735 PMCID: PMC6421592 DOI: 10.1177/2515841419835460] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 02/08/2019] [Indexed: 12/13/2022] Open

158

Soneson C, Love MI, Patro R, Hussain S, Malhotra D, Robinson MD. A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs. Life Sci Alliance 2019;2:2/1/e201800175. [PMID: 30655364 PMCID: PMC6337739 DOI: 10.26508/lsa.201800175] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 02/01/2023] Open

159

Alasoo K, Rodrigues J, Danesh J, Freitag DF, Paul DS, Gaffney DJ. Genetic effects on promoter usage are highly context-specific and contribute to complex traits. eLife 2019;8:e41673. [PMID: 30618377 PMCID: PMC6349408 DOI: 10.7554/elife.41673] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 01/08/2019] [Indexed: 12/12/2022] Open

160

Garanina IA, Fisunov GY, Govorun VM. BAC-BROWSER: The Tool for Visualization and Analysis of Prokaryotic Genomes. Front Microbiol 2018;9:2827. [PMID: 30519231 PMCID: PMC6258810 DOI: 10.3389/fmicb.2018.02827] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 11/05/2018] [Indexed: 11/13/2022] Open

161

Lee D, Cheng A, Lawlor N, Bolisetty M, Ucar D. Detection of correlated hidden factors from single cell transcriptomes using Iteratively Adjusted-SVA (IA-SVA). Sci Rep 2018;8:17040. [PMID: 30451954 PMCID: PMC6242813 DOI: 10.1038/s41598-018-35365-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 11/01/2018] [Indexed: 01/01/2023] Open

162

Ye CJ, Chen J, Villani AC, Gate RE, Subramaniam M, Bhangale T, Lee MN, Raj T, Raychowdhury R, Li W, Rogel N, Simmons S, Imboywa SH, Chipendo PI, McCabe C, Lee MH, Frohlich IY, Stranger BE, De Jager PL, Regev A, Behrens T, Hacohen N. Genetic analysis of isoform usage in the human anti-viral response reveals influenza-specific regulation of ERAP2 transcripts under balancing selection. Genome Res 2018;28:1812-1825. [PMID: 30446528 PMCID: PMC6280757 DOI: 10.1101/gr.240390.118] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 10/09/2018] [Indexed: 02/02/2023]

Affiliation(s)

Chun Jimmie Ye Institute for Human Genetics, Institute for Health and Computational Sciences, Department of Biostatistics and Epidemiology, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California 94143, USA
Jenny Chen Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
Alexandra-Chloé Villani Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Medicine, Massachusetts General Hospital Cancer Center, Boston, Massachusetts 02114, USA
Rachel E Gate Institute for Human Genetics, Institute for Health and Computational Sciences, Department of Biostatistics and Epidemiology, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California 94143, USA.,Biomedical Informatics Program, University of California, San Francisco, California 94143, USA
Meena Subramaniam Institute for Human Genetics, Institute for Health and Computational Sciences, Department of Biostatistics and Epidemiology, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California 94143, USA.,Biomedical Informatics Program, University of California, San Francisco, California 94143, USA
Tushar Bhangale Genentech Incorporated, South San Francisco, California 94080, USA
Mark N Lee Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Medicine, Massachusetts General Hospital Cancer Center, Boston, Massachusetts 02114, USA.,Harvard Medical School, Boston, Massachusetts 02116, USA
Towfique Raj Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Harvard Medical School, Boston, Massachusetts 02116, USA.,Departments of Neurology and Psychiatry, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
Raktima Raychowdhury Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Weibo Li Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Noga Rogel Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Sean Simmons Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Selina H Imboywa Harvard Medical School, Boston, Massachusetts 02116, USA
Portia I Chipendo Harvard Medical School, Boston, Massachusetts 02116, USA
Cristin McCabe Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Departments of Neurology and Psychiatry, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
Michelle H Lee Harvard Medical School, Boston, Massachusetts 02116, USA
Irene Y Frohlich Harvard Medical School, Boston, Massachusetts 02116, USA
Barbara E Stranger Section of Genetic Medicine, Department of Medicine, Institute for Genomics and Systems Biology, Center for Data Intensive Science, The University of Chicago, Chicago, Illinois 60637, USA
Philip L De Jager Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Harvard Medical School, Boston, Massachusetts 02116, USA.,Departments of Neurology and Psychiatry, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
Aviv Regev Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA
Tim Behrens Genentech Incorporated, South San Francisco, California 94080, USA
Nir Hacohen Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Medicine, Massachusetts General Hospital Cancer Center, Boston, Massachusetts 02114, USA

Collapse

163

Westoby J, Herrera MS, Ferguson-Smith AC, Hemberg M. Simulation-based benchmarking of isoform quantification in single-cell RNA-seq. Genome Biol 2018;19:191. [PMID: 30404663 PMCID: PMC6223048 DOI: 10.1186/s13059-018-1571-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 10/19/2018] [Indexed: 11/18/2022] Open

164

Rigaill G, Balzergue S, Brunaud V, Blondet E, Rau A, Rogier O, Caius J, Maugis-Rabusseau C, Soubigou-Taconnat L, Aubourg S, Lurin C, Martin-Magniette ML, Delannoy E. Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis. Brief Bioinform 2018;19:65-76. [PMID: 27742662 DOI: 10.1093/bib/bbw092] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Indexed: 12/16/2022] Open

165

Sterne-Weiler T, Weatheritt RJ, Best AJ, Ha KC, Blencowe BJ. Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop. Mol Cell 2018;72:187-200.e6. [DOI: 10.1016/j.molcel.2018.08.018] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 06/24/2018] [Accepted: 08/09/2018] [Indexed: 01/08/2023]

166

Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data. G3-GENES GENOMES GENETICS 2018;8:2923-2940. [PMID: 30021829 PMCID: PMC6118309 DOI: 10.1534/g3.118.200373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.

Collapse

167

Arzalluz-Luque Á, Conesa A. Single-cell RNAseq for the study of isoforms-how is that possible? Genome Biol 2018;19:110. [PMID: 30097058 PMCID: PMC6085759 DOI: 10.1186/s13059-018-1496-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

168

Quinn TP, Crowley TM, Richardson MF. Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods. BMC Bioinformatics 2018;19:274. [PMID: 30021534 PMCID: PMC6052553 DOI: 10.1186/s12859-018-2261-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 06/25/2018] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Count data generated by next-generation sequencing assays do not measure absolute transcript abundances. Instead, the data are constrained to an arbitrary "library size" by the sequencing depth of the assay, and typically must be normalized prior to statistical analysis. The constrained nature of these data means one could alternatively use a log-ratio transformation in lieu of normalization, as often done when testing for differential abundance (DA) of operational taxonomic units (OTUs) in 16S rRNA data. Therefore, we benchmark how well the ALDEx2 package, a transformation-based DA tool, detects differential expression in high-throughput RNA-sequencing data (RNA-Seq), compared to conventional RNA-Seq methods such as edgeR and DESeq2.

RESULTS

To evaluate the performance of log-ratio transformation-based tools, we apply the ALDEx2 package to two simulated, and two real, RNA-Seq data sets. One of the latter was previously used to benchmark dozens of conventional RNA-Seq differential expression methods, enabling us to directly compare transformation-based approaches. We show that ALDEx2, widely used in meta-genomics research, identifies differentially expressed genes (and transcripts) from RNA-Seq data with high precision and, given sufficient sample sizes, high recall too (regardless of the alignment and quantification procedure used). Although we show that the choice in log-ratio transformation can affect performance, ALDEx2 has high precision (i.e., few false positives) across all transformations. Finally, we present a novel, iterative log-ratio transformation (now implemented in ALDEx2) that further improves performance in simulations.

CONCLUSIONS

Our results suggest that log-ratio transformation-based methods can work to measure differential expression from RNA-Seq data, provided that certain assumptions are met. Moreover, these methods have very high precision (i.e., few false positives) in simulations and perform well on real data too. With previously demonstrated applicability to 16S rRNA data, ALDEx2 can thus serve as a single tool for data from multiple sequencing modalities.

Collapse

169

Merleev AA, Marusina AI, Ma C, Elder JT, Tsoi LC, Raychaudhuri SP, Weidinger S, Wang EA, Adamopoulos IE, Luxardi G, Gudjonsson JE, Shimoda M, Maverakis E. Meta-analysis of RNA sequencing datasets reveals an association between TRAJ23, psoriasis, and IL-17A. JCI Insight 2018;3:120682. [PMID: 29997305 DOI: 10.1172/jci.insight.120682] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 05/23/2018] [Indexed: 12/20/2022] Open

170

Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018;7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.3] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/27/2018] [Indexed: 12/30/2022] Open

171

Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018;7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.1] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/22/2018] [Indexed: 12/25/2022] Open

172

Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018;7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/10/2018] [Indexed: 09/29/2023] Open

173

Robinson S, Nevalainen J, Pinna G, Campalans A, Radicella JP, Guyon L. Incorporating interaction networks into the determination of functionally related hit genes in genomic experiments with Markov random fields. Bioinformatics 2018;33:i170-i179. [PMID: 28881978 PMCID: PMC5870666 DOI: 10.1093/bioinformatics/btx244] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

174

Li M, Xie X, Zhou J, Sheng M, Yin X, Ko EA, Zhou T, Gu W. Quantifying circular RNA expression from RNA-seq data using model-based framework. Bioinformatics 2018;33:2131-2139. [PMID: 28334396 DOI: 10.1093/bioinformatics/btx129] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 03/07/2017] [Indexed: 11/13/2022] Open

Abstract

Motivation

Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification.

Results

Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir.

Availability and Implementation

Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir .

Contact

tongz@medicine.nevada.edu or wanjun.gu@gmail.com.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

175

Ha KCH, Blencowe BJ, Morris Q. QAPA: a new method for the systematic analysis of alternative polyadenylation from RNA-seq data. Genome Biol 2018;19:45. [PMID: 29592814 PMCID: PMC5874996 DOI: 10.1186/s13059-018-1414-4] [Citation(s) in RCA: 122] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 02/28/2018] [Indexed: 12/21/2022] Open

176

Simion P, Belkhir K, François C, Veyssier J, Rink JC, Manuel M, Philippe H, Telford MJ. A software tool 'CroCo' detects pervasive cross-species contamination in next generation sequencing data. BMC Biol 2018;16:28. [PMID: 29506533 PMCID: PMC5838952 DOI: 10.1186/s12915-018-0486-7] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 01/11/2018] [Indexed: 01/20/2023] Open

177

Fang H, Huang YF, Radhakrishnan A, Siepel A, Lyon GJ, Schatz MC. Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution. Cell Syst 2018;6:180-191.e4. [PMID: 29361467 PMCID: PMC5832574 DOI: 10.1016/j.cels.2017.12.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 09/24/2017] [Accepted: 12/08/2017] [Indexed: 10/18/2022]

178

Liu R, Dickerson J. Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq. PLoS Comput Biol 2017;13:e1005851. [PMID: 29176847 PMCID: PMC5720828 DOI: 10.1371/journal.pcbi.1005851] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 12/07/2017] [Accepted: 10/26/2017] [Indexed: 12/14/2022] Open

Abstract

We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Availability: Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.

Transcript assembly and quantification are important bioinformatics applications of RNA-Seq. The difficulty of solving these problem arises from the ambiguity of reads assignment to isoforms uniquely. This challenge is twofold: statistically, it requires a high-dimensional mixture model, and computationally, it needs to process datasets that commonly consist of tens of millions of reads. Existing algorithms either use very complex models that are too slow or assume no models, rather heuristic, and thus less accurate. Strawberry seeks to achieve a great balance between the model complexity and speed. Strawberry effectively leverages a graph-based algorithm to utilize all possible information from pair-end reads and, to our knowledge, is the first to apply a flow network algorithm on the constrained assembly problem. We are also the first to formulate the quantification problem in a latent class model. All of these features not only lead to a more flexible and complex quantification model but also yield software that is easier to maintain and extend. In this method paper, we have shown that the Strawberry method is novel, accurate, fast and scalable using both simulated data and real data.

Collapse

179

Systematic Identification and Molecular Characteristics of Long Noncoding RNAs in Pig Tissues. BIOMED RESEARCH INTERNATIONAL 2017;2017:6152582. [PMID: 29062838 PMCID: PMC5618743 DOI: 10.1155/2017/6152582] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 07/26/2017] [Accepted: 08/08/2017] [Indexed: 12/15/2022]

180

Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W. Comparative Analysis of Single-Cell RNA Sequencing Methods. Mol Cell 2017;65:631-643.e4. [PMID: 28212749 DOI: 10.1016/j.molcel.2017.01.023] [Citation(s) in RCA: 949] [Impact Index Per Article: 118.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Revised: 12/01/2016] [Accepted: 01/17/2017] [Indexed: 02/06/2023]

181

Zhang C, Zhang B, Lin LL, Zhao S. Evaluation and comparison of computational tools for RNA-seq isoform quantification. BMC Genomics 2017;18:583. [PMID: 28784092 PMCID: PMC5547501 DOI: 10.1186/s12864-017-4002-1] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 08/01/2017] [Indexed: 11/10/2022] Open

182

Zakeri M, Srivastava A, Almodaresi F, Patro R. Improved data-driven likelihood factorizations for transcript abundance estimation. Bioinformatics 2017;33:i142-i151. [PMID: 28881996 PMCID: PMC5870700 DOI: 10.1093/bioinformatics/btx262] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

MOTIVATION

Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation.

RESULTS

We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations.

AVAILABILITY AND IMPLEMENTATION

Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations .

CONTACT

rob.patro@cs.stonybrook.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

183

Pimentel H, Bray NL, Puente S, Melsted P, Pachter L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat Methods 2017;14:687-690. [PMID: 28581496 DOI: 10.1101/058164] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 05/04/2017] [Indexed: 05/22/2023]

184

Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 2017;14:417-419. [PMID: 28263959 PMCID: PMC5600148 DOI: 10.1038/nmeth.4197] [Citation(s) in RCA: 6965] [Impact Index Per Article: 870.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 01/22/2017] [Indexed: 12/12/2022]

185

Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 2017. [PMID: 28263959 DOI: 10.1038/nmeth.4197.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

186

Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B, Irizarry RA, Leek JT, Jaffe AE. Flexible expressed region analysis for RNA-seq with derfinder. Nucleic Acids Res 2017;45:e9. [PMID: 27694310 PMCID: PMC5314792 DOI: 10.1093/nar/gkw852] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 08/25/2016] [Accepted: 09/15/2016] [Indexed: 12/20/2022] Open

Affiliation(s)

Leonardo Collado-Torres Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA
Abhinav Nellore Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
Alyssa C Frazee Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA
Christopher Wilks Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
Michael I Love Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA Dana-Farber Cancer Institute, Harvard University, Boston, MA 02215, USA
Ben Langmead Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
Rafael A Irizarry Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA Dana-Farber Cancer Institute, Harvard University, Boston, MA 02215, USA
Jeffrey T Leek Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA
Andrew E Jaffe Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA Department of Mental Health, Johns Hopkins University, Baltimore, MD 21205, USA

Collapse

187

Love MI, Hogenesch JB, Irizarry RA. Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nat Biotechnol 2016. [PMID: 27669167 DOI: 10.1101/025767] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]

188

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains. Sci Rep 2016;6:37243. [PMID: 27876823 PMCID: PMC5120338 DOI: 10.1038/srep37243] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 10/27/2016] [Indexed: 11/08/2022] Open

189

Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nat Biotechnol 2016;34:1287-1291. [PMID: 27669167 PMCID: PMC5143225 DOI: 10.1038/nbt.3682] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 08/22/2016] [Indexed: 11/17/2022]

190

Sun X, Dalpiaz D, Wu D, S Liu J, Zhong W, Ma P. Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model. BMC Bioinformatics 2016;17:324. [PMID: 27565575 PMCID: PMC5002174 DOI: 10.1186/s12859-016-1180-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 08/11/2016] [Indexed: 02/05/2023] Open

Abstract

Background

Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score.

Results

Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis.

Conclusions

The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1180-9) contains supplementary material, which is available to authorized users.

Collapse

191

Ziemann M, Kaspi A, El-Osta A. Evaluation of microRNA alignment techniques. RNA (NEW YORK, N.Y.) 2016;22:1120-38. [PMID: 27284164 PMCID: PMC4931105 DOI: 10.1261/rna.055509.115] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 05/04/2016] [Indexed: 05/26/2023]

192

MetaTrans: an open-source pipeline for metatranscriptomics. Sci Rep 2016;6:26447. [PMID: 27211518 PMCID: PMC4876386 DOI: 10.1038/srep26447] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 04/29/2016] [Indexed: 01/08/2023] Open

193

Germain PL, Vitriolo A, Adamo A, Laise P, Das V, Testa G. RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods. Nucleic Acids Res 2016;44:5054-67. [PMID: 27190234 PMCID: PMC4914128 DOI: 10.1093/nar/gkw448] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 05/09/2016] [Indexed: 11/13/2022] Open

194

Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. A benchmark for RNA-seq quantification pipelines. Genome Biol 2016;17:74. [PMID: 27107712 PMCID: PMC4842274 DOI: 10.1186/s13059-016-0940-1] [Citation(s) in RCA: 127] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 04/08/2016] [Indexed: 02/07/2023] Open

Affiliation(s)

Mingxiang Teng Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Michael I Love Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
Carrie A Davis Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
Sarah Djebali Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
Alexander Dobin Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
Brenton R Graveley Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
Sheng Li Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
Christopher E Mason Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
Sara Olson Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
Dmitri Pervouchine Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
Cricket A Sloan Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477, Stanford, CA, 94305, USA
Xintao Wei Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
Lijun Zhan Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
Rafael A Irizarry Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA. .,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.

Collapse

195

Hirsch CD, Springer NM, Hirsch CN. Genomic limitations to RNA sequencing expression profiling. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2015;84:491-503. [PMID: 26331235 DOI: 10.1111/tpj.13014] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 08/25/2015] [Indexed: 05/24/2023]