1
|
Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Sci Rep 2020; 10:19737. [PMID: 33184454 PMCID: PMC7665074 DOI: 10.1038/s41598-020-76881-x] [Citation(s) in RCA: 121] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 11/03/2020] [Indexed: 01/16/2023] Open
Abstract
RNA-seq is currently considered the most powerful, robust and adaptable technique for measuring gene expression and transcription activation at genome-wide level. As the analysis of RNA-seq data is complex, it has prompted a large amount of research on algorithms and methods. This has resulted in a substantial increase in the number of options available at each step of the analysis. Consequently, there is no clear consensus about the most appropriate algorithms and pipelines that should be used to analyse RNA-seq data. In the present study, 192 pipelines using alternative methods were applied to 18 samples from two human cell lines and the performance of the results was evaluated. Raw gene expression signal was quantified by non-parametric statistics to measure precision and accuracy. Differential gene expression performance was estimated by testing 17 differential expression methods. The procedures were validated by qRT-PCR in the same samples. This study weighs up the advantages and disadvantages of the tested algorithms and pipelines providing a comprehensive guide to the different methods and procedures applied to the analysis of RNA-seq data, both for the quantification of the raw expression signal and for the differential gene expression.
Collapse
|
2
|
Machado FB, Moharana KC, Almeida-Silva F, Gazara RK, Pedrosa-Silva F, Coelho FS, Grativol C, Venancio TM. Systematic analysis of 1298 RNA-Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:1894-1909. [PMID: 32445587 DOI: 10.1111/tpj.14850] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 04/15/2020] [Accepted: 05/06/2020] [Indexed: 05/23/2023]
Abstract
Soybean (Glycine max [L.] Merr.) is a major crop in animal feed and human nutrition, mainly for its rich protein and oil contents. The remarkable rise in soybean transcriptome studies over the past 5 years generated an enormous amount of RNA-seq data, encompassing various tissues, developmental conditions and genotypes. In this study, we have collected data from 1298 publicly available soybean transcriptome samples, processed the raw sequencing reads and mapped them to the soybean reference genome in a systematic fashion. We found that 94% of the annotated genes (52 737/56 044) had detectable expression in at least one sample. Unsupervised clustering revealed three major groups, comprising samples from aerial, underground and seed/seed-related parts. We found 452 genes with uniform and constant expression levels, supporting their roles as housekeeping genes. On the other hand, 1349 genes showed heavily biased expression patterns towards particular tissues. A transcript-level analysis revealed that 95% (70 963 of 74 490) of the assembled transcripts have intron chains exactly matching those from known transcripts, whereas 3256 assembled transcripts represent potentially novel splicing isoforms. The dataset compiled here constitute a new resource for the community, which can be downloaded or accessed through a user-friendly web interface at http://venanciogroup.uenf.br/resources/. This comprehensive transcriptome atlas will likely accelerate research on soybean genetics and genomics.
Collapse
Affiliation(s)
- Fabricio B Machado
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Kanhu C Moharana
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Fabricio Almeida-Silva
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Rajesh K Gazara
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Francisnei Pedrosa-Silva
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Fernanda S Coelho
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Clícia Grativol
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Thiago M Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| |
Collapse
|