1
|
Evans C, Hardin J, Stoebel DM. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief Bioinform 2018; 19:776-792. [PMID: 28334202 PMCID: PMC6171491 DOI: 10.1093/bib/bbx008] [Citation(s) in RCA: 161] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 01/06/2017] [Indexed: 11/13/2022] Open
Abstract
RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.
Collapse
Affiliation(s)
- Ciaran Evans
- Department of Statistics, Baker Hall, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | |
Collapse
|
2
|
Minnier J, Pennock ND, Guo Q, Schedin P, Harrington CA. RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide Expression Profiling. Methods Mol Biol 2018; 1783:7-33. [PMID: 29767356 DOI: 10.1007/978-1-4939-7834-2_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The development of genome-wide gene expression profiling technologies over the past two decades has produced great opportunity for researchers to explore the transcriptome and to better understand biological systems and their perturbation. In this chapter we provide an overview of microarray and massively parallel sequencing technologies and their application to gene expression analysis. We discuss factors that impact expression data generation and analysis that which should be considered in the application of these technology platforms. We further present the results of a simple illustration study to highlight performance similarities and differences in expression profiling of protein-coding mRNAs with each platform. Based on technical and analytical differences between the two platforms, reports in the literature comparing arrays and RNA-Seq for gene expression, and our own example study and experience, we provide recommendations for platform selection for gene expression studies.
Collapse
Affiliation(s)
- Jessica Minnier
- School of Public Health, Oregon Health and Science University, Portland, OR, USA
| | - Nathan D Pennock
- Department of Cell, Developmental and Cancer Biology, Oregon Health and Science University, Portland, OR, USA
| | - Qiuchen Guo
- Department of Cell, Developmental and Cancer Biology, Oregon Health and Science University, Portland, OR, USA
| | - Pepper Schedin
- Department of Cell, Developmental and Cancer Biology, Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Young Women's Breast Cancer Translational Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christina A Harrington
- Integrated Genomics Laboratory, Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA.
| |
Collapse
|
3
|
Lim JH, Lee SY, Kim JH. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data. Genomics Inform 2017; 15:51-53. [PMID: 28416950 PMCID: PMC5389949 DOI: 10.5808/gi.2017.15.1.51] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 02/07/2017] [Accepted: 02/15/2017] [Indexed: 11/20/2022] Open
Abstract
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
Collapse
Affiliation(s)
- Jae Hyun Lim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics and Systems Biomedical Informatics Research Center, Seoul National University College of Medicine, Seoul 110799, Korea
| | - Soo Youn Lee
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics and Systems Biomedical Informatics Research Center, Seoul National University College of Medicine, Seoul 110799, Korea
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics and Systems Biomedical Informatics Research Center, Seoul National University College of Medicine, Seoul 110799, Korea
| |
Collapse
|
4
|
Gallagher IJ, Jacobi C, Tardif N, Rooyackers O, Fearon K. Omics/systems biology and cancer cachexia. Semin Cell Dev Biol 2016; 54:92-103. [DOI: 10.1016/j.semcdb.2015.12.022] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 12/30/2015] [Indexed: 10/22/2022]
|
5
|
Solano-Aguilar G, Molokin A, Botelho C, Fiorino AM, Vinyard B, Li R, Chen C, Urban J, Dawson H, Andreyeva I, Haverkamp M, Hibberd PL. Transcriptomic Profile of Whole Blood Cells from Elderly Subjects Fed Probiotic Bacteria Lactobacillus rhamnosus GG ATCC 53103 (LGG) in a Phase I Open Label Study. PLoS One 2016; 11:e0147426. [PMID: 26859761 PMCID: PMC4747532 DOI: 10.1371/journal.pone.0147426] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Accepted: 12/31/2015] [Indexed: 02/07/2023] Open
Abstract
We examined gene expression of whole blood cells (WBC) from 11 healthy elderly volunteers participating on a Phase I open label study before and after oral treatment with Lactobacillus rhamnosus GG-ATCC 53103 (LGG)) using RNA-sequencing (RNA-Seq). Elderly patients (65–80 yrs) completed a clinical assessment for health status and had blood drawn for cellular RNA extraction at study admission (Baseline), after 28 days of daily LGG treatment (Day 28) and at the end of the study (Day 56) after LGG treatment had been suspended for 28 days. Treatment compliance was verified by measuring LGG-DNA copy levels detected in host fecal samples. Normalized gene expression levels in WBC RNA were analyzed using a paired design built within three analysis platforms (edgeR, DESeq2 and TSPM) commonly used for gene count data analysis. From the 25,990 transcripts detected, 95 differentially expressed genes (DEGs) were detected in common by all analysis platforms with a nominal significant difference in gene expression at Day 28 following LGG treatment (FDR<0.1; 77 decreased and 18 increased). With a more stringent significance threshold (FDR<0.05), only two genes (FCER2 and LY86), were down-regulated more than 1.5 fold and met the criteria for differential expression across two analysis platforms. The remaining 93 genes were only detected at this threshold level with DESeq2 platform. Data analysis for biological interpretation of DEGs with an absolute fold change of 1.5 revealed down-regulation of overlapping genes involved with Cellular movement, Cell to cell signaling interactions, Immune cell trafficking and Inflammatory response. These data provide evidence for LGG-induced transcriptional modulation in healthy elderly volunteers because pre-treatment transcription levels were restored at 28 days after LGG treatment was stopped. To gain insight into the signaling pathways affected in response to LGG treatment, DEG were mapped using biological pathways and genomic data mining packages to indicate significant biological relevance. Trial Registration: ClinicalTrials.gov NCT01274598
Collapse
Affiliation(s)
- Gloria Solano-Aguilar
- Diet, Genomics, and Immunology Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
- * E-mail:
| | - Aleksey Molokin
- Diet, Genomics, and Immunology Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Christine Botelho
- Division of Global Health, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Anne-Maria Fiorino
- Division of Global Health, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Bryan Vinyard
- Statistics Group, Northeast Area, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Robert Li
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Celine Chen
- Diet, Genomics, and Immunology Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Joseph Urban
- Diet, Genomics, and Immunology Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Harry Dawson
- Diet, Genomics, and Immunology Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Irina Andreyeva
- Division of Global Health, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Miriam Haverkamp
- Division of Global Health, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Patricia L. Hibberd
- Division of Global Health, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
6
|
Lin Y, Golovnina K, Chen ZX, Lee HN, Negron YLS, Sultana H, Oliver B, Harbison ST. Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. BMC Genomics 2016; 17:28. [PMID: 26732976 PMCID: PMC4702322 DOI: 10.1186/s12864-015-2353-z] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 12/21/2015] [Indexed: 11/29/2022] Open
Abstract
Background A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set. Results We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex. Conclusions The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2353-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yanzhu Lin
- Laboratory of Systems Genetics, Center for Systems Biology, National Heart Lung and Blood Institute, 10 Center Drive, MSC 1640, Bethesda, MD, 20892, USA.
| | - Kseniya Golovnina
- Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA.
| | - Zhen-Xia Chen
- Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA.
| | - Hang Noh Lee
- Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA.
| | - Yazmin L Serrano Negron
- Laboratory of Systems Genetics, Center for Systems Biology, National Heart Lung and Blood Institute, 10 Center Drive, MSC 1640, Bethesda, MD, 20892, USA.
| | - Hina Sultana
- Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA.
| | - Brian Oliver
- Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA.
| | - Susan T Harbison
- Laboratory of Systems Genetics, Center for Systems Biology, National Heart Lung and Blood Institute, 10 Center Drive, MSC 1640, Bethesda, MD, 20892, USA.
| |
Collapse
|
7
|
Khang TF, Lau CY. Getting the most out of RNA-seq data analysis. PeerJ 2015; 3:e1360. [PMID: 26539333 PMCID: PMC4631466 DOI: 10.7717/peerj.1360] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 10/08/2015] [Indexed: 11/20/2022] Open
Abstract
Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.
Collapse
Affiliation(s)
- Tsung Fei Khang
- Institute of Mathematical Sciences, University of Malaya , Kuala Lumpur , Malaysia
| | - Ching Yee Lau
- Institute of Biological Sciences, University of Malaya , Kuala Lumpur , Malaysia
| |
Collapse
|
8
|
Cui S, Guha S, Ferreira MAR, Tegge AN. hmmSeq: A hidden Markov model for detecting differentially expressed genes from RNA-seq data. Ann Appl Stat 2015. [DOI: 10.1214/15-aoas815] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
9
|
Bourdon-Lacombe JA, Moffat ID, Deveau M, Husain M, Auerbach S, Krewski D, Thomas RS, Bushel PR, Williams A, Yauk CL. Technical guide for applications of gene expression profiling in human health risk assessment of environmental chemicals. Regul Toxicol Pharmacol 2015; 72:292-309. [PMID: 25944780 DOI: 10.1016/j.yrtph.2015.04.010] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 04/10/2015] [Accepted: 04/13/2015] [Indexed: 01/14/2023]
Abstract
Toxicogenomics promises to be an important part of future human health risk assessment of environmental chemicals. The application of gene expression profiles (e.g., for hazard identification, chemical prioritization, chemical grouping, mode of action discovery, and quantitative analysis of response) is growing in the literature, but their use in formal risk assessment by regulatory agencies is relatively infrequent. Although additional validations for specific applications are required, gene expression data can be of immediate use for increasing confidence in chemical evaluations. We believe that a primary reason for the current lack of integration is the limited practical guidance available for risk assessment specialists with limited experience in genomics. The present manuscript provides basic information on gene expression profiling, along with guidance on evaluating the quality of genomic experiments and data, and interpretation of results presented in the form of heat maps, pathway analyses and other common approaches. Moreover, potential ways to integrate information from gene expression experiments into current risk assessment are presented using published studies as examples. The primary objective of this work is to facilitate integration of gene expression data into human health risk assessments of environmental chemicals.
Collapse
Affiliation(s)
| | - Ivy D Moffat
- Water and Air Quality Bureau, Health Canada, Ottawa, ON, Canada.
| | - Michelle Deveau
- Water and Air Quality Bureau, Health Canada, Ottawa, ON, Canada
| | - Mainul Husain
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada
| | - Scott Auerbach
- Biomolecular Screening Branch, Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC, United States
| | - Daniel Krewski
- McLaughlin Centre for Population Health Risk Assessment, University of Ottawa, Ottawa, ON, Canada
| | - Russell S Thomas
- National Centre for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, United States
| | - Pierre R Bushel
- Biostatistics and Computational Biology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, Research Triangle Park, NC, United States
| | - Andrew Williams
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada
| | - Carole L Yauk
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada
| |
Collapse
|
10
|
Khatoon Z, Figler B, Zhang H, Cheng F. Introduction to RNA-Seq and its applications to drug discovery and development. Drug Dev Res 2015; 75:324-30. [PMID: 25160072 DOI: 10.1002/ddr.21215] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Preclinical Research RNA sequencing (RNA-Seq) is a novel high-throughput technology for comprehensive transcriptome study. It can measure the expression levels of thousands of genes simultaneously and provide insight into functional pathways and regulations in biological processes. In addition, RNA-Seq can provide copious information on alternative splicing, allele-specific expression, unannotated exons, and novel transcripts (gene or noncoding RNAs). This technology has revolutionized the way biologists examine transcriptomes and has been successfully applied in drug discovery and development, being able to identify drug-related genes, microRNAs, and fusion proteins. In this overview, we will review this technology including data analysis, and its recent applications in drug discovery and development.
Collapse
Affiliation(s)
- Zainab Khatoon
- Department of Pharmacodynamics, College of Pharmacy, University of Florida, Gainesville, FL, 32610-0484, USA
| | | | | | | |
Collapse
|
11
|
Rau A, Maugis-Rabusseau C, Martin-Magniette ML, Celeux G. Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. ACTA ACUST UNITED AC 2015; 31:1420-7. [PMID: 25563332 DOI: 10.1093/bioinformatics/btu845] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 12/19/2014] [Indexed: 11/12/2022]
Abstract
MOTIVATION In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. RESULTS In this work, we focus on the question of clustering DGE profiles as a means to discover groups of co-expressed genes. We propose a Poisson mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number of clusters. We illustrate co-expression analyses using our approach on two real RNA-seq datasets. A set of simulation studies also compares the performance of the proposed model with that of several related approaches developed to cluster RNA-seq or serial analysis of gene expression data. AVAILABILITY AND AND IMPLEMENTATION The proposed method is implemented in the open-source R package HTSCluster, available on CRAN. CONTACT andrea.rau@jouy.inra.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrea Rau
- INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France
| | - Cathy Maugis-Rabusseau
- INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France
| | - Marie-Laure Martin-Magniette
- INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV
| | - Gilles Celeux
- INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France
| |
Collapse
|
12
|
Ellis J, Lange EM, Li J, Dupuis J, Baumert J, Walston JD, Keating BJ, Durda P, Fox ER, Palmer CD, Meng YA, Young T, Farlow DN, Schnabel RB, Marzi CS, Larkin E, Martin LW, Bis JC, Auer P, Ramachandran VS, Gabriel SB, Willis MS, Pankow JS, Papanicolaou GJ, Rotter JI, Ballantyne CM, Gross MD, Lettre G, Wilson JG, Peters U, Koenig W, Tracy RP, Redline S, Reiner AP, Benjamin EJ, Lange LA. Large multiethnic Candidate Gene Study for C-reactive protein levels: identification of a novel association at CD36 in African Americans. Hum Genet 2014; 133:985-95. [PMID: 24643644 DOI: 10.1007/s00439-014-1439-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 03/06/2014] [Indexed: 10/25/2022]
Abstract
C-reactive protein (CRP) is a heritable biomarker of systemic inflammation and a predictor of cardiovascular disease (CVD). Large-scale genetic association studies for CRP have largely focused on individuals of European descent. We sought to uncover novel genetic variants for CRP in a multiethnic sample using the ITMAT Broad-CARe (IBC) array, a custom 50,000 SNP gene-centric array having dense coverage of over 2,000 candidate CVD genes. We performed analyses on 7,570 African Americans (AA) from the Candidate gene Association Resource (CARe) study and race-combined meta-analyses that included 29,939 additional individuals of European descent from CARe, the Women's Health Initiative (WHI) and KORA studies. We observed array-wide significance (p < 2.2 × 10(-6)) for four loci in AA, three of which have been reported previously in individuals of European descent (IL6R, p = 2.0 × 10(-6); CRP, p = 4.2 × 10(-71); APOE, p = 1.6 × 10(-6)). The fourth significant locus, CD36 (p = 1.6 × 10(-6)), was observed at a functional variant (rs3211938) that is extremely rare in individuals of European descent. We replicated the CD36 finding (p = 1.8 × 10(-5)) in an independent sample of 8,041 AA women from WHI; a meta-analysis combining the CARe and WHI AA results at rs3211938 reached genome-wide significance (p = 1.5 × 10(-10)). In the race-combined meta-analyses, 13 loci reached significance, including ten (CRP, TOMM40/APOE/APOC1, HNF1A, LEPR, GCKR, IL6R, IL1RN, NLRP3, HNF4A and BAZ1B/BCL7B) previously associated with CRP, and one (ARNTL) previously reported to be nominally associated with CRP. Two novel loci were also detected (RPS6KB1, p = 2.0 × 10(-6); CD36, p = 1.4 × 10(-6)). These results highlight both shared and unique genetic risk factors for CRP in AA compared to populations of European descent.
Collapse
Affiliation(s)
- Jaclyn Ellis
- Department of Genetics, University of North Carolina, 5112 Genetic Medicine Bldg., Chapel Hill, NC, 27599-7264, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Trabzuni D, Thomson PC. Analysis of gene expression data using a linear mixed model/finite mixture model approach: application to regional differences in the human brain. ACTA ACUST UNITED AC 2014; 30:1555-61. [PMID: 24519379 DOI: 10.1093/bioinformatics/btu088] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Gene expression data exhibit common information over the genome. This article shows how data can be analysed from an efficient whole-genome perspective. Further, the methods have been developed so that users with limited expertise in bioinformatics and statistical computing techniques could use and modify this procedure to their own needs. The method outlined first uses a large-scale linear mixed model for the expression data genome-wide, and then uses finite mixture models to separate differentially expressed (DE) from non-DE transcripts. These methods are illustrated through application to an exceptional UK Brain Expression Consortium involving 12 human frozen post-mortem brain regions. RESULTS Fitting linear mixed models has allowed variation in gene expression between different biological states (e.g. brain regions, gender, age) to be investigated. The model can be extended to allow for differing levels of variation between different biological states. Predicted values of the random effects show the effects of each transcript in a particular biological state. Using the UK Brain Expression Consortium data, this approach yielded striking patterns of co-regional gene expression. Fitting the finite mixture model to the effects within each state provides a convenient method to filter transcripts that are DE: these DE transcripts can then be extracted for advanced functional analysis. AVAILABILITY The data for all regions except HYPO and SPCO are available at the Gene Expression Omnibus (GEO) site, accession number GSE46706. R code for the analysis is available in the Supplementary file.
Collapse
Affiliation(s)
- Daniah Trabzuni
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK, Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia and ReproGen - Animal Bioscience Group, Faculty of Veterinary Science, The University of Sydney, 425 Werombi Road, Camden, NSW 2570, AustraliaDepartment of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK, Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia and ReproGen - Animal Bioscience Group, Faculty of Veterinary Science, The University of Sydney, 425 Werombi Road, Camden, NSW 2570, Australia
| | | | - Peter C Thomson
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK, Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia and ReproGen - Animal Bioscience Group, Faculty of Veterinary Science, The University of Sydney, 425 Werombi Road, Camden, NSW 2570, Australia
| |
Collapse
|
14
|
Rodríguez Cubillos AE, Perlaza-Jiménez L, Bernal Giraldo AJ. RNA-Seq Data Analysis in Prokaryotes: A Review for Non-experts. ACTA BIOLÓGICA COLOMBIANA 2014. [DOI: 10.15446/abc.v19n2.41010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
15
|
Wood SL, Westbrook JA, Brown JE. Omic-profiling in breast cancer metastasis to bone: implications for mechanisms, biomarkers and treatment. Cancer Treat Rev 2013; 40:139-52. [PMID: 23958309 DOI: 10.1016/j.ctrv.2013.07.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 07/16/2013] [Accepted: 07/21/2013] [Indexed: 01/25/2023]
Abstract
Despite well-recognised advances in breast cancer treatment, there remain substantial numbers of patients who develop metastatic disease, of which up to 70% involves spread to bone, resulting in skeletal complications which have a major negative impact on mortality and quality of life. Bisphosphonates and newer bone-targeted agents have reduced the prevalence of skeletal complications, yet there remains significant unmet clinical need, particularly for the development of more specific therapies for the prevention and treatment of metastatic bone disease, for the prediction of risk of its development in individual patients and for the prediction of response to treatments. Modern 'omic' strategies can potentially make a major contribution to meeting this need. Technological advances in the field of nucleic acid sequencing, mass spectrometry and metabolic profiling have driven progress in genomics, transcriptomics (functional genomics), proteomics and metabolomics. This review appraises the recent application of these approaches to studies of breast cancer metastasis (particularly to bone), with a focus on understanding how omic approaches may lead to new therapeutic options and to novel biomarker molecules or molecular signatures with potential value in clinical practise. The increasingly recognised need for rigorous sample quality control and both pre-clinical and clinical validation to meet the ultimate goals of clinical utility and patient benefit is discussed. Future directions of omic driven research in breast cancer metastasis are considered, in particular micro-RNAs and their role in the post-transcriptional regulation of gene function and the possible role of cancer-stem cells and epigenetic modifications in the development of distant metastases.
Collapse
Affiliation(s)
- Steven L Wood
- Wolfson Molecular Imaging Centre, University of Manchester, Manchester M20 3LJ, UK.
| | | | | |
Collapse
|
16
|
Østrup O, Olbricht G, Østrup E, Hyttel P, Collas P, Cabot R. RNA profiles of porcine embryos during genome activation reveal complex metabolic switch sensitive to in vitro conditions. PLoS One 2013; 8:e61547. [PMID: 23637850 PMCID: PMC3639270 DOI: 10.1371/journal.pone.0061547] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Accepted: 03/11/2013] [Indexed: 11/18/2022] Open
Abstract
Fertilization is followed by complex changes in cytoplasmic composition and extensive chromatin reprogramming which results in the abundant activation of totipotent embryonic genome at embryonic genome activation (EGA). While chromatin reprogramming has been widely studied in several species, only a handful of reports characterize changing transcriptome profiles and resulting metabolic changes in cleavage stage embryos. The aims of the current study were to investigate RNA profiles of in vivo developed (ivv) and in vitro produced (ivt) porcine embryos before (2-cell stage) and after (late 4-cell stage) EGA and determine major metabolic changes that regulate totipotency. The period before EGA was dominated by transcripts responsible for cell cycle regulation, mitosis, RNA translation and processing (including ribosomal machinery), protein catabolism, and chromatin remodelling. Following EGA an increase in the abundance of transcripts involved in transcription, translation, DNA metabolism, histone and chromatin modification, as well as protein catabolism was detected. The further analysis of members of overlapping GO terms revealed that despite that comparable cellular processes are taking place before and after EGA (RNA splicing, protein catabolism), different metabolic pathways are involved. This strongly suggests that a complex metabolic switch accompanies EGA. In vitro conditions significantly altered RNA profiles before EGA, and the character of these changes indicates that they originate from oocyte and are imposed either before oocyte aspiration or during in vitro maturation. IVT embryos have altered content of apoptotic factors, cell cycle regulation factors and spindle components, and transcription factors, which all may contribute to reduced developmental competence of embryos produced in vitro. Overall, our data are in good accordance with previously published, genome-wide profiling data in other species. Moreover, comparison with mouse and human embryos showed striking overlap in functional annotation of transcripts during the EGA, suggesting conserved basic mechanisms regulating establishment of totipotency in mammalian development.
Collapse
Affiliation(s)
- Olga Østrup
- Institute for Basic Medical Sciences, Faculty of Medicine, University of Oslo and Norwegian Center for Stem Cell Research, Oslo, Norway.
| | | | | | | | | | | |
Collapse
|
17
|
Time series expression analyses using RNA-seq: a statistical approach. BIOMED RESEARCH INTERNATIONAL 2013; 2013:203681. [PMID: 23586021 PMCID: PMC3622290 DOI: 10.1155/2013/203681] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2012] [Revised: 01/10/2013] [Accepted: 01/15/2013] [Indexed: 11/29/2022]
Abstract
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Collapse
|
18
|
Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 2013; 14:91. [PMID: 23497356 PMCID: PMC3608160 DOI: 10.1186/1471-2105-14-91] [Citation(s) in RCA: 532] [Impact Index Per Article: 48.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Accepted: 03/01/2013] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.
Collapse
Affiliation(s)
- Charlotte Soneson
- Bioinformatics Core Facility, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | | |
Collapse
|
19
|
Ambrose KV, Belanger FC. SOLiD-SAGE of endophyte-infected red fescue reveals numerous effects on host transcriptome and an abundance of highly expressed fungal secreted proteins. PLoS One 2012; 7:e53214. [PMID: 23285269 PMCID: PMC3532157 DOI: 10.1371/journal.pone.0053214] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 11/27/2012] [Indexed: 11/19/2022] Open
Abstract
One of the most important plant-fungal symbiotic relationships is that of cool season grasses with endophytic fungi of the genera Epichloë and Neotyphodium. These associations often confer benefits, such as resistance to herbivores and improved drought tolerance, to the hosts. One benefit that appears to be unique to fine fescue grasses is disease resistance. As a first step towards understanding the basis of the endophyte-mediated disease resistance in Festuca rubra we carried out a SOLiD-SAGE quantitative transcriptome comparison of endophyte-free and Epichloë festucae-infected F. rubra. Over 200 plant genes involved in a wide variety of physiological processes were statistically significantly differentially expressed between the two samples. Many of the endophyte expressed genes were surprisingly abundant, with the most abundant fungal tag representing over 10% of the fungal mapped tags. Many of the abundant fungal tags were for secreted proteins. The second most abundantly expressed fungal gene was for a secreted antifungal protein and is of particular interest regarding the endophyte-mediated disease resistance. Similar genes in Penicillium and Aspergillus spp. have been demonstrated to have antifungal activity. Of the 10 epichloae whole genome sequences available, only one isolate of E. festucae and Neotyphodium gansuense var inebrians have an antifungal protein gene. The uniqueness of this gene in E. festucae from F. rubra, its transcript abundance, and the secreted nature of the protein, all suggest it may be involved in the disease resistance conferred to the host, which is a unique feature of the fine fescue-endophyte symbiosis.
Collapse
Affiliation(s)
- Karen V. Ambrose
- Department of Plant Biology and Pathology, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Faith C. Belanger
- Department of Plant Biology and Pathology, Rutgers University, New Brunswick, New Jersey, United States of America
| |
Collapse
|
20
|
Robles JA, Qureshi SE, Stephen SJ, Wilson SR, Burden CJ, Taylor JM. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing. BMC Genomics 2012; 13:484. [PMID: 22985019 PMCID: PMC3560154 DOI: 10.1186/1471-2164-13-484] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 08/10/2012] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. RESULTS Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. CONCLUSIONS This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Collapse
Affiliation(s)
- José A Robles
- CSIRO Plant Industry, Black Mountain Laboratories, Canberra, Australia
| | | | | | | | | | | |
Collapse
|
21
|
Kliebenstein DJ. Exploring the shallow end; estimating information content in transcriptomics studies. FRONTIERS IN PLANT SCIENCE 2012; 3:213. [PMID: 22973290 PMCID: PMC3437520 DOI: 10.3389/fpls.2012.00213] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 08/23/2012] [Indexed: 05/20/2023]
Abstract
Transcriptomics is a major platform to study organismal biology. The advent of new parallel sequencing technologies has opened up a new avenue of transcriptomics with ever deeper and deeper sequencing to identify and quantify each and every transcript in a sample. However, this may not be the best usage of the parallel sequencing technology for all transcriptomics experiments. I utilized the Shannon Entropy approach to estimate the information contained within a transcriptomics experiment and tested the ability of shallow RNAseq to capture the majority of this information. This analysis showed that it was possible to capture nearly all of the network or genomic information present in a variety of transcriptomics experiments using a subset of the most abundant 5000 transcripts or less within any given sample. Thus, it appears that it should be possible and affordable to conduct large scale factorial analysis with a high degree of replication using parallel sequencing technologies.
Collapse
Affiliation(s)
- Daniel J. Kliebenstein
- Department of Plant Sciences, University of CaliforniaDavis, CA, USA
- *Correspondence: Daniel J. Kliebenstein, Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616 USA. e-mail:
| |
Collapse
|