Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Frazee AC, Jaffe AE, Langmead B, Leek JT. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 2015;31:2778-84. [PMID: 25926345 DOI: 10.1093/bioinformatics/btv272] [Citation(s) in RCA: 195] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 04/18/2015] [Indexed: 12/26/2022] Open

For:	Frazee AC, Jaffe AE, Langmead B, Leek JT. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 2015;31:2778-84. [PMID: 25926345 DOI: 10.1093/bioinformatics/btv272] [Citation(s) in RCA: 195] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 04/18/2015] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

101

Shi X, Neuwald AF, Wang X, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles. Bioinformatics 2021;37:650-658. [PMID: 33016988 PMCID: PMC8097681 DOI: 10.1093/bioinformatics/btaa852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 08/27/2020] [Accepted: 09/21/2020] [Indexed: 11/14/2022] Open

102

Stupnikov A, McInerney CE, Savage KI, McIntosh SA, Emmert-Streib F, Kennedy R, Salto-Tellez M, Prise KM, McArt DG. Robustness of differential gene expression analysis of RNA-seq. Comput Struct Biotechnol J 2021;19:3470-3481. [PMID: 34188784 PMCID: PMC8214188 DOI: 10.1016/j.csbj.2021.05.040] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 05/25/2021] [Accepted: 05/25/2021] [Indexed: 01/05/2023] Open

103

Sarantopoulou D, Brooks TG, Nayak S, Mrčela A, Lahens NF, Grant GR. Comparative evaluation of full-length isoform quantification from RNA-Seq. BMC Bioinformatics 2021;22:266. [PMID: 34034652 PMCID: PMC8145802 DOI: 10.1186/s12859-021-04198-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 05/16/2021] [Indexed: 11/18/2022] Open

104

Dent CI, Singh S, Mukherjee S, Mishra S, Sarwade RD, Shamaya N, Loo KP, Harrison P, Sureshkumar S, Powell D, Balasubramanian S. Quantifying splice-site usage: a simple yet powerful approach to analyze splicing. NAR Genom Bioinform 2021;3:lqab041. [PMID: 34017946 PMCID: PMC8121094 DOI: 10.1093/nargab/lqab041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/24/2021] [Accepted: 04/28/2021] [Indexed: 02/07/2023] Open

105

Ma C, Zheng H, Kingsford C. Exact transcript quantification over splice graphs. Algorithms Mol Biol 2021;16:5. [PMID: 33971903 PMCID: PMC8112020 DOI: 10.1186/s13015-021-00184-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 04/19/2021] [Indexed: 11/10/2022] Open

106

Chung M, Bruno VM, Rasko DA, Cuomo CA, Muñoz JF, Livny J, Shetty AC, Mahurkar A, Dunning Hotopp JC. Best practices on the differential expression analysis of multi-species RNA-seq. Genome Biol 2021;22:121. [PMID: 33926528 PMCID: PMC8082843 DOI: 10.1186/s13059-021-02337-8] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 04/01/2021] [Indexed: 02/07/2023] Open

107

Melnick M, Gonzales P, LaRocca TJ, Song Y, Wuu J, Benatar M, Oskarsson B, Petrucelli L, Dowell RD, Link CD, Prudencio M. Application of a bioinformatic pipeline to RNA-seq data identifies novel viruslike sequence in human blood. G3-GENES GENOMES GENETICS 2021;11:6259144. [PMID: 33914880 PMCID: PMC8661426 DOI: 10.1093/g3journal/jkab141] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 12/11/2022]

108

Wolf SA, Epping L, Andreotti S, Reinert K, Semmler T. SCORE: Smart Consensus Of RNA Expression-a consensus tool for detecting differentially expressed genes in bacteria. Bioinformatics 2021;37:426-428. [PMID: 32717040 DOI: 10.1093/bioinformatics/btaa681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/11/2020] [Accepted: 07/24/2020] [Indexed: 11/13/2022] Open

109

Gerber S, Schratt G, Germain PL. Streamlining differential exon and 3' UTR usage with diffUTR. BMC Bioinformatics 2021;22:189. [PMID: 33849458 PMCID: PMC8045333 DOI: 10.1186/s12859-021-04114-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022] Open

110

Behera S, Voshall A, Moriyama EN. Plant Transcriptome Assembly: Review and Benchmarking. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

111

Liu P, Ewald J, Galvez JH, Head J, Crump D, Bourque G, Basu N, Xia J. Ultrafast functional profiling of RNA-seq data for nonmodel organisms. Genome Res 2021;31:713-720. [PMID: 33731361 PMCID: PMC8015844 DOI: 10.1101/gr.269894.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 02/18/2021] [Indexed: 12/02/2022]

112

Sarkar H, Srivastava A, Bravo HC, Love MI, Patro R. Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data. Bioinformatics 2021;36:i102-i110. [PMID: 32657377 PMCID: PMC7355257 DOI: 10.1093/bioinformatics/btaa448] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

113

Sánchez-Ramírez S, Weiss JG, Thomas CG, Cutter AD. Widespread misregulation of inter-species hybrid transcriptomes due to sex-specific and sex-chromosome regulatory evolution. PLoS Genet 2021;17:e1009409. [PMID: 33667233 PMCID: PMC7968742 DOI: 10.1371/journal.pgen.1009409] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/17/2021] [Accepted: 02/09/2021] [Indexed: 01/04/2023] Open

Abstract

When gene regulatory networks diverge between species, their dysfunctional expression in inter-species hybrid individuals can create genetic incompatibilities that generate the developmental defects responsible for intrinsic post-zygotic reproductive isolation. Both cis- and trans-acting regulatory divergence can be hastened by directional selection through adaptation, sexual selection, and inter-sexual conflict, in addition to cryptic evolution under stabilizing selection. Dysfunctional sex-biased gene expression, in particular, may provide an important source of sexually-dimorphic genetic incompatibilities. Here, we characterize and compare male and female/hermaphrodite transcriptome profiles for sibling nematode species Caenorhabditis briggsae and C. nigoni, along with allele-specific expression in their F1 hybrids, to deconvolve features of expression divergence and regulatory dysfunction. Despite evidence of widespread stabilizing selection on gene expression, misexpression of sex-biased genes pervades F1 hybrids of both sexes. This finding implicates greater fragility of male genetic networks to produce dysfunctional organismal phenotypes. Spermatogenesis genes are especially prone to high divergence in both expression and coding sequences, consistent with a "faster male" model for Haldane's rule and elevated sterility of hybrid males. Moreover, underdominant expression pervades male-biased genes compared to female-biased and sex-neutral genes and an excess of cis-trans compensatory regulatory divergence for X-linked genes underscores a "large-X effect" for hybrid male expression dysfunction. Extensive regulatory divergence in sex determination pathway genes likely contributes to demasculinization of XX hybrids. The evolution of genetic incompatibilities due to regulatory versus coding sequence divergence, however, are expected to arise in an uncorrelated fashion. This study identifies important differences between the sexes in how regulatory networks diverge to contribute to sex-biases in how genetic incompatibilities manifest during the speciation process.

Collapse

114

Sokolowski DJ, Faykoo-Martinez M, Erdman L, Hou H, Chan C, Zhu H, Holmes MM, Goldenberg A, Wilson MD. Single-cell mapper (scMappR): using scRNA-seq to infer the cell-type specificities of differentially expressed genes. NAR Genom Bioinform 2021;3:lqab011. [PMID: 33655208 PMCID: PMC7902236 DOI: 10.1093/nargab/lqab011] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 12/23/2020] [Accepted: 02/04/2021] [Indexed: 12/11/2022] Open

115

Chen SY, Liu CJ, Zhang Q, Guo AY. An ultra-sensitive T-cell receptor detection method for TCR-Seq and RNA-Seq data. Bioinformatics 2021;36:4255-4262. [PMID: 32399561 DOI: 10.1093/bioinformatics/btaa432] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/14/2020] [Accepted: 05/06/2020] [Indexed: 12/30/2022] Open

116

Varabyou A, Salzberg SL, Pertea M. Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments. Genome Res 2021;31:301-308. [PMID: 33361112 PMCID: PMC7849408 DOI: 10.1101/gr.266213.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/18/2020] [Indexed: 12/25/2022]

117

Parada GE, Munita R, Georgakopoulos-Soares I, Fernandes HJR, Kedlian VR, Metzakopian E, Andres ME, Miska EA, Hemberg M. MicroExonator enables systematic discovery and quantification of microexons across mouse embryonic development. Genome Biol 2021;22:43. [PMID: 33482885 PMCID: PMC7821500 DOI: 10.1186/s13059-020-02246-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 12/15/2020] [Indexed: 12/12/2022] Open

118

FADU: a Quantification Tool for Prokaryotic Transcriptomic Analyses. mSystems 2021;6:6/1/e00917-20. [PMID: 33436511 PMCID: PMC7901478 DOI: 10.1128/msystems.00917-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

Most currently available quantification tools for transcriptomics analyses have been designed for human data sets, in which full-length transcript sequences, including the untranslated regions, are well annotated. In most prokaryotic systems, full-length transcript sequences have yet to be characterized, leading to prokaryotic transcriptomics analyses being performed based on only the coding sequences.

Quantification tools for RNA sequencing (RNA-Seq) analyses are often designed and tested using human transcriptomics data sets, in which full-length transcript sequences are well annotated. For prokaryotic transcriptomics experiments, full-length transcript sequences are seldom known, and coding sequences must instead be used for quantification steps in RNA-Seq analyses. However, operons confound accurate quantification of coding sequences since a single transcript does not necessarily equate to a single gene. Here, we introduce FADU (Feature Aggregate Depth Utility), a quantification tool designed specifically for prokaryotic RNA-Seq analyses. FADU assigns partial count values proportional to the length of the fragment overlapping the target feature. To assess the ability of FADU to quantify genes in prokaryotic transcriptomics analyses, we compared its performance to those of eXpress, featureCounts, HTSeq, kallisto, and Salmon across three paired-end read data sets of (i) Ehrlichia chaffeensis, (ii) Escherichia coli, and (iii) the Wolbachia endosymbiont wBm. Across each of the three data sets, we find that FADU can more accurately quantify operonic genes by deriving proportional counts for multigene fragments within operons. FADU is available at https://github.com/IGS/FADU.

IMPORTANCE Most currently available quantification tools for transcriptomics analyses have been designed for human data sets, in which full-length transcript sequences, including the untranslated regions, are well annotated. In most prokaryotic systems, full-length transcript sequences have yet to be characterized, leading to prokaryotic transcriptomics analyses being performed based on only the coding sequences. In contrast to eukaryotes, prokaryotes contain polycistronic transcripts, and when genes are quantified based on coding sequences instead of transcript sequences, this leads to an increased abundance of improperly assigned ambiguous multigene fragments, specifically those mapping to multiple genes in operons. Here, we describe FADU, a quantification tool for prokaryotic RNA-Seq analyses designed to assign proportional counts with the purpose of better quantifying operonic genes while minimizing the pitfalls associated with improperly assigning fragment counts from ambiguous transcripts.

Collapse

119

Shao W, Wang T. Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data. Genome Res 2021;31:88-100. [PMID: 33355230 PMCID: PMC7849386 DOI: 10.1101/gr.265173.120] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 11/24/2020] [Indexed: 12/28/2022]

120

Duan Y, Zhang W, Cheng Y, Shi M, Xia XQ. A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs. RNA (NEW YORK, N.Y.) 2021;27:80-98. [PMID: 33055239 PMCID: PMC7749630 DOI: 10.1261/rna.074724.120] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]

121

Banerjee S, Velásquez-Zapata V, Fuerst G, Elmore JM, Wise RP. NGPINT: a next-generation protein-protein interaction software. Brief Bioinform 2020;22:6046042. [PMID: 33367498 DOI: 10.1093/bib/bbaa351] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 10/23/2020] [Accepted: 11/02/2020] [Indexed: 12/27/2022] Open

122

Spinozzi G, Tini V, Adorni A, Falini B, Martelli MP. ARPIR: automatic RNA-Seq pipelines with interactive report. BMC Bioinformatics 2020;21:574. [PMID: 33349239 PMCID: PMC7751108 DOI: 10.1186/s12859-020-03846-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 10/27/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

RNA-Seq is an increasing used methodology to study either coding and non-coding RNA expression. There are many software tools available for each phase of the RNA-Seq analysis and each of them uses different algorithms. Furthermore, the analysis consists of several steps regarding alignment (primary-analysis), quantification, differential analysis (secondary-analysis) and any tertiary-analysis and can therefore be time-consuming to deal with each step separately, in addition to requiring a computer knowledge. For this reason, the development of an automated pipeline that allows the entire analysis to be managed through a single initial command and that is easy to use even for those without computer skills can be useful. Faced with the vast availability of RNA-Seq analysis tools, it is first of all necessary to select a limited number of pipelines to include. For this purpose, we compared eight pipelines obtained by combining the most used tools and for each one we evaluated peak of RAM, time, sensitivity and specificity.

RESULTS

The pipeline with shorter times, lower consumption of RAM and higher sensitivity is the one consisting in HISAT2 for alignment, featureCounts for quantification and edgeR for differential analysis. Here, we developed ARPIR, an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose, between different tools, those of the pipelines having the best performances.

CONCLUSIONS

ARPIR allows the analysis of RNA-Seq data from groups undergoing different treatment allowing multiple comparisons in a single launch and can be used either for paired-end or single-end analysis. All the required prerequisites can be installed via a configuration script and the analysis can be launched via a graphical interface or by a template script. In addition, ARPIR makes a final tertiary-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny App and exported in a report (pdf, word or html formats). ARPIR is an efficient and easy-to-use tool for RNA-Seq analysis from quality control to Pathway analysis that allows you to choose between different pipelines.

Collapse

123

Puntambekar S, Newhouse R, San-Miguel J, Chauhan R, Vernaz G, Willis T, Wayland MT, Umrania Y, Miska EA, Prabakaran S. Evolutionary divergence of novel open reading frames in cichlids speciation. Sci Rep 2020;10:21570. [PMID: 33299045 PMCID: PMC7726158 DOI: 10.1038/s41598-020-78555-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/26/2020] [Indexed: 01/02/2023] Open

124

Chen L, Lang K, Mei Y, Shi Z, He K, Li F, Xiao H, Ye G, Han Z. FastD: Fast detection of insecticide target-site mutations and overexpressed detoxification genes in insect populations from RNA-Seq data. Ecol Evol 2020;10:14346-14358. [PMID: 33391720 PMCID: PMC7771117 DOI: 10.1002/ece3.7037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 08/26/2020] [Accepted: 09/21/2020] [Indexed: 11/24/2022] Open

Abstract

Target-site mutations and detoxification gene overexpression are two major mechanisms conferring insecticide resistance. Molecular assays applied to detect these resistance genetic markers are time-consuming and with high false-positive rates. RNA-Seq data contains information on the variations within expressed genomic regions and expression of detoxification genes. However, there is no corresponding method to detect resistance markers at present. Here, we collected 66 reported resistance mutations of four insecticide targets (AChE, VGSC, RyR, and nAChR) from 82 insect species. Next, we obtained 403 sequences of the four target genes and 12,665 sequences of three kinds of detoxification genes including P450s, GSTs, and CCEs. Then, we developed a Perl program, FastD, to detect target-site mutations and overexpressed detoxification genes from RNA-Seq data and constructed a web server for FastD (http://www.insect-genome.com/fastd). The estimation of FastD on simulated RNA-Seq data showed high sensitivity and specificity. We applied FastD to detect resistant markers in 15 populations of six insects, Plutella xylostella, Aphis gossypii, Anopheles arabiensis, Musca domestica, Leptinotarsa decemlineata and Apis mellifera. Results showed that 11 RyR mutations in P. xylostella, one nAChR mutation in A. gossypii, one VGSC mutation in A. arabiensis and five VGSC mutations in M. domestica were found to be with frequency difference >40% between resistant and susceptible populations including previously confirmed mutations G4946E in RyR, R81T in nAChR and L1014F in VGSC. And 49 detoxification genes were found to be overexpressed in resistant populations compared with susceptible populations including previously confirmed detoxification genes CYP6BG1, CYP6CY22, CYP6CY13, CYP6P3, CYP6M2, CYP6P4 and CYP4G16. The candidate target-site mutations and detoxification genes were worth further validation. Resistance estimates according to confirmed markers were consistent with population phenotypes, confirming the reliability of this program in predicting population resistance at omics-level.

Collapse

125

Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom Bioinform 2020;2:lqaa078. [PMID: 33015620 PMCID: PMC7518324 DOI: 10.1093/nargab/lqaa078] [Citation(s) in RCA: 697] [Impact Index Per Article: 139.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 08/09/2020] [Accepted: 09/17/2020] [Indexed: 12/25/2022] Open

126

Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, Love MI, Kingsford C, Patro R. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol 2020;21:239. [PMID: 32894187 PMCID: PMC7487471 DOI: 10.1186/s13059-020-02151-8] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 08/19/2020] [Indexed: 01/23/2023] Open

127

Germain PL, Sonrel A, Robinson MD. pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools. Genome Biol 2020;21:227. [PMID: 32873325 PMCID: PMC7465801 DOI: 10.1186/s13059-020-02136-7] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 08/06/2020] [Indexed: 11/13/2022] Open

128

Chen X, Zhang B, Wang T, Bonni A, Zhao G. Robust principal component analysis for accurate outlier sample detection in RNA-Seq data. BMC Bioinformatics 2020;21:269. [PMID: 32600248 PMCID: PMC7324992 DOI: 10.1186/s12859-020-03608-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 06/16/2020] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

High throughput RNA sequencing is a powerful approach to study gene expression. Due to the complex multiple-steps protocols in data acquisition, extreme deviation of a sample from samples of the same treatment group may occur due to technical variation or true biological differences. The high-dimensionality of the data with few biological replicates make it challenging to accurately detect those samples, and this issue is not well studied in the literature currently. Robust statistics is a family of theories and techniques aim to detect the outliers by first fitting the majority of the data and then flagging data points that deviate from it. Robust statistics have been widely used in multivariate data analysis for outlier detection in chemometrics and engineering. Here we apply robust statistics on RNA-seq data analysis.

RESULTS

We report the use of two robust principal component analysis (rPCA) methods, PcaHubert and PcaGrid, to detect outlier samples in multiple simulated and real biological RNA-seq data sets with positive control outlier samples. PcaGrid achieved 100% sensitivity and 100% specificity in all the tests using positive control outliers with varying degrees of divergence. We applied rPCA methods and classical principal component analysis (cPCA) on an RNA-Seq data set profiling gene expression of the external granule layer in the cerebellum of control and conditional SnoN knockout mice. Both rPCA methods detected the same two outlier samples but cPCA failed to detect any. We performed differentially expressed gene detection before and after outlier removal as well as with and without batch effect modeling. We validated gene expression changes using quantitative reverse transcription PCR and used the result as reference to compare the performance of eight different data analysis strategies. Removing outliers without batch effect modeling performed the best in term of detecting biologically relevant differentially expressed genes.

CONCLUSIONS

rPCA implemented in the PcaGrid function is an accurate and objective method to detect outlier samples. It is well suited for high-dimensional data with small sample sizes like RNA-seq data. Outlier removal can significantly improve the performance of differential gene detection and downstream functional analysis.

Collapse

129

Boonekamp FJ, Dashko S, Duiker D, Gehrmann T, van den Broek M, den Ridder M, Pabst M, Robert V, Abeel T, Postma ED, Daran JM, Daran-Lapujade P. Design and Experimental Evaluation of a Minimal, Innocuous Watermarking Strategy to Distinguish Near-Identical DNA and RNA Sequences. ACS Synth Biol 2020;9:1361-1375. [PMID: 32413257 PMCID: PMC7309318 DOI: 10.1021/acssynbio.0c00045] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

130

Naraine R, Abaffy P, Sidova M, Tomankova S, Pocherniaieva K, Smolik O, Kubista M, Psenicka M, Sindelka R. NormQ: RNASeq normalization based on RT-qPCR derived size factors. Comput Struct Biotechnol J 2020;18:1173-1181. [PMID: 32514328 PMCID: PMC7264052 DOI: 10.1016/j.csbj.2020.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 05/07/2020] [Accepted: 05/07/2020] [Indexed: 02/04/2023] Open

131

Wilson-Sánchez D, Lup SD, Sarmiento-Mañús R, Ponce MR, Micol JL. Next-generation forward genetic screens: using simulated data to improve the design of mapping-by-sequencing experiments in Arabidopsis. Nucleic Acids Res 2020;47:e140. [PMID: 31544937 PMCID: PMC6868388 DOI: 10.1093/nar/gkz806] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 09/07/2019] [Accepted: 09/10/2019] [Indexed: 12/25/2022] Open

132

Marcelino VR, Clausen PTLC, Buchmann JP, Wille M, Iredell JR, Meyer W, Lund O, Sorrell TC, Holmes EC. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol 2020;21:103. [PMID: 32345331 PMCID: PMC7189439 DOI: 10.1186/s13059-020-02014-2] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 04/13/2020] [Indexed: 01/19/2023] Open

Affiliation(s)

Vanessa R Marcelino Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia. Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia. School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
Philip T L C Clausen National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
Jan P Buchmann School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
Michelle Wille WHO Collaborating Centre for Reference and Research on Influenza, The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, 3000, Australia
Jonathan R Iredell Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia Westmead Hospital (Research and Education Network), Westmead, NSW, 2145, Australia
Wieland Meyer Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia Westmead Hospital (Research and Education Network), Westmead, NSW, 2145, Australia Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
Ole Lund National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
Tania C Sorrell Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
Edward C Holmes Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia

Collapse

133

Ozuna A, Liberto D, Joyce RM, Arnvig KB, Nobeli I. baerhunter: an R package for the discovery and analysis of expressed non-coding regions in bacterial RNA-seq data. Bioinformatics 2020;36:966-969. [PMID: 31418770 DOI: 10.1093/bioinformatics/btz643] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 07/29/2019] [Accepted: 08/13/2019] [Indexed: 12/12/2022] Open

134

Górczak K, Claesen J, Burzykowski T. A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads. J Comput Biol 2020;27:1232-1247. [PMID: 31895597 DOI: 10.1089/cmb.2019.0272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

135

Yang A, Kishore A, Phipps B, Ho JWK. Cloud accelerated alignment and assembly of full-length single-cell RNA-seq data using Falco. BMC Genomics 2019;20:927. [PMID: 31888474 PMCID: PMC6936136 DOI: 10.1186/s12864-019-6341-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 11/26/2019] [Indexed: 12/18/2022] Open

136

Ma C, Kingsford C. Detecting, Categorizing, and Correcting Coverage Anomalies of RNA-Seq Quantification. Cell Syst 2019;9:589-599.e7. [PMID: 31786209 DOI: 10.1016/j.cels.2019.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 07/09/2019] [Accepted: 10/17/2019] [Indexed: 11/13/2022]

137

Zheng H, Brennan K, Hernaez M, Gevaert O. Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples. Gigascience 2019;8:giz145. [PMID: 31808800 PMCID: PMC6897288 DOI: 10.1093/gigascience/giz145] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 09/30/2019] [Accepted: 11/15/2019] [Indexed: 12/14/2022] Open

138

Li WV, Li S, Tong X, Deng L, Shi H, Li JJ. AIDE: annotation-assisted isoform discovery with high precision. Genome Res 2019;29:2056-2072. [PMID: 31694868 PMCID: PMC6886511 DOI: 10.1101/gr.251108.119] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 09/27/2019] [Indexed: 02/06/2023]

139

Song L, Sabunciyan S, Yang G, Florea L. A multi-sample approach increases the accuracy of transcript assembly. Nat Commun 2019;10:5000. [PMID: 31676772 PMCID: PMC6825223 DOI: 10.1038/s41467-019-12990-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Accepted: 10/11/2019] [Indexed: 01/21/2023] Open

140

Zhu A, Srivastava A, Ibrahim JG, Patro R, Love MI. Nonparametric expression analysis using inferential replicate counts. Nucleic Acids Res 2019;47:e105. [PMID: 31372651 PMCID: PMC6765120 DOI: 10.1093/nar/gkz622] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/11/2019] [Accepted: 07/11/2019] [Indexed: 11/13/2022] Open

141

Zhou L, Chi-Hau Sue A, Bin Goh WW. Examining the practical limits of batch effect-correction algorithms: When should you care about batch effects? J Genet Genomics 2019;46:433-443. [PMID: 31611172 DOI: 10.1016/j.jgg.2019.08.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 08/02/2019] [Accepted: 08/04/2019] [Indexed: 12/20/2022]

142

Kerkvliet J, de Fouchier A, van Wijk M, Groot AT. The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras. Ecol Evol 2019;9:10513-10521. [PMID: 31624564 PMCID: PMC6787812 DOI: 10.1002/ece3.5571] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 07/22/2019] [Accepted: 07/28/2019] [Indexed: 12/31/2022] Open

143

Gunady MK, Mount SM, Corrada Bravo H. Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis. BMC Bioinformatics 2019;20:421. [PMID: 31409274 PMCID: PMC6693274 DOI: 10.1186/s12859-019-2947-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 06/12/2019] [Indexed: 12/13/2022] Open

144

Deng W, Mou T, Kalari KR, Niu N, Wang L, Pawitan Y, Vu TN. Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data. Bioinformatics 2019;36:805-812. [PMID: 31400221 PMCID: PMC9883676 DOI: 10.1093/bioinformatics/btz640] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 06/13/2019] [Accepted: 08/09/2019] [Indexed: 02/02/2023] Open

145

Anwar MZ, Lanzen A, Bang-Andreasen T, Jacobsen CS. To assemble or not to resemble-A validated Comparative Metatranscriptomics Workflow (CoMW). Gigascience 2019;8:giz096. [PMID: 31363751 PMCID: PMC6667343 DOI: 10.1093/gigascience/giz096] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Revised: 05/15/2019] [Accepted: 07/16/2019] [Indexed: 01/08/2023] Open

Abstract

BACKGROUND

Metatranscriptomics has been used widely for investigation and quantification of microbial communities' activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provides an understanding of the interactions between different major functional guilds and the environment. Here, we present a de novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure. Metatranscriptomics typically uses short sequence reads, which can either be directly aligned to external reference databases ("assembly-free approach") or first assembled into contigs before alignment ("assembly-based approach"). We also compare CoMW (assembly-based implementation) with an assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases.

RESULTS

CoMW provided significantly fewer false-positive results, resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false-positive results at thresholds ranging from inclusive to stringent compared with the assembly-free approach, which yielded up to 15% false-positive results. Using specialized databases (carbohydrate-active enzyme and nitrogen cycle), the assembly-based approach identified and quantified genes with 3-5 times fewer false-positive results. We also evaluated the impact of both approaches on real-world datasets.

CONCLUSIONS

We present an open source de novo assembly-based CoMW. Our benchmarking findings support assembling short reads into contigs before alignment to a reference database because this provides higher precision and minimizes false-positive results.

Collapse

146

Raghupathy N, Choi K, Vincent MJ, Beane GL, Sheppard KS, Munger SC, Korstanje R, Pardo-Manual de Villena F, Churchill GA. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics 2019;34:2177-2184. [PMID: 29444201 DOI: 10.1093/bioinformatics/bty078] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 02/09/2018] [Indexed: 02/06/2023] Open

Abstract

Motivation

Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation.

Results

Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects.

Availability and implementation

EMASE software is available at https://github.com/churchill-lab/emase.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

147

Sarkar H, Srivastava A, Patro R. Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level. Bioinformatics 2019;35:i136-i144. [PMID: 31510649 PMCID: PMC6612833 DOI: 10.1093/bioinformatics/btz351] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

SUMMARY

With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

148

Afsari B, Guo T, Considine M, Florea L, Kagohara LT, Stein-O'Brien GL, Kelley D, Flam E, Zambo KD, Ha PK, Geman D, Ochs MF, Califano JA, Gaykalova DA, Favorov AV, Fertig EJ. Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer. Bioinformatics 2019;34:1859-1867. [PMID: 29342249 DOI: 10.1093/bioinformatics/bty004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open

Abstract

Motivation

Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches.

Results

We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data.

Availability and implementation

SEVA is implemented in the R/Bioconductor package GSReg.

Contact

bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

149

Korthauer K, Kimes PK, Duvallet C, Reyes A, Subramanian A, Teng M, Shukla C, Alm EJ, Hicks SC. A practical guide to methods controlling false discoveries in computational biology. Genome Biol 2019;20:118. [PMID: 31164141 PMCID: PMC6547503 DOI: 10.1186/s13059-019-1716-1] [Citation(s) in RCA: 238] [Impact Index Per Article: 39.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 05/10/2019] [Indexed: 01/06/2023] Open

150

Singer JM, Fu DY, Hughey JJ. Simphony: simulating large-scale, rhythmic data. PeerJ 2019;7:e6985. [PMID: 31198637 PMCID: PMC6535214 DOI: 10.7717/peerj.6985] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 04/15/2019] [Indexed: 12/26/2022] Open