1
|
|
2
|
Transcriptomic Studies of Malaria: a Paradigm for Investigation of Systemic Host-Pathogen Interactions. Microbiol Mol Biol Rev 2018; 82:e00071-17. [PMID: 29695497 PMCID: PMC5968457 DOI: 10.1128/mmbr.00071-17] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Transcriptomics, the analysis of genome-wide RNA expression, is a common approach to investigate host and pathogen processes in infectious diseases. Technical and bioinformatic advances have permitted increasingly thorough analyses of the association of RNA expression with fundamental biology, immunity, pathogenesis, diagnosis, and prognosis. Transcriptomic approaches can now be used to realize a previously unattainable goal, the simultaneous study of RNA expression in host and pathogen, in order to better understand their interactions. This exciting prospect is not without challenges, especially as focus moves from interactions in vitro under tightly controlled conditions to tissue- and systems-level interactions in animal models and natural and experimental infections in humans. Here we review the contribution of transcriptomic studies to the understanding of malaria, a parasitic disease which has exerted a major influence on human evolution and continues to cause a huge global burden of disease. We consider malaria a paradigm for the transcriptomic assessment of systemic host-pathogen interactions in humans, because much of the direct host-pathogen interaction occurs within the blood, a readily sampled compartment of the body. We illustrate lessons learned from transcriptomic studies of malaria and how these lessons may guide studies of host-pathogen interactions in other infectious diseases. We propose that the potential of transcriptomic studies to improve the understanding of malaria as a disease remains partly untapped because of limitations in study design rather than as a consequence of technological constraints. Further advances will require the integration of transcriptomic data with analytical approaches from other scientific disciplines, including epidemiology and mathematical modeling.
Collapse
|
3
|
Abstract
High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.
Collapse
|
4
|
Type of gonadotropin used during controlled ovarian stimulation induces differential gene expression in human cumulus cells: A randomized study. Eur J Obstet Gynecol Reprod Biol 2017. [PMID: 28622634 DOI: 10.1016/j.ejogrb.2017.06.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
BACKGROUND The cumulus-oocyte complex plays a central role in the regulation of folliculogenesis where it is important for the maturation, reprogramming, and fertilization of oocytes. Consequently, cumulus cell gene expression profiling is being explored as a promising method for assessing oocyte competence in the near future. Through DNA microarray technology, we analyzed the potential differences in the gene expression profiles of cumulus cells from preovulatory follicles after controlled ovarian stimulation using different types of gonadotropins. METHODS A prospective, randomized study was performed among 90 women participating in an oocyte donation program. Subjects were assigned to receive recombinant follicle-stimulating hormone (FSH), urinary FSH, or human menopausal gonadotropin (hMG). The gene expression profile in cumulus cells was analyzed according the type of gonadotropin received during ovarian stimulation. Furthermore, we also performed a gene ontology analysis to provide structural knowledge. RESULTS Hierarchical clustering, principal component analysis, and gene enrichment analysis revealed greater differences between the urinary FSH and hMG groups compared to the rest of the pair-wise comparisons; recombinant FSH vs hMG and urinary FSH vs recombinant FSH. CONCLUSIONS Data suggest that controlled ovarian stimulation induces specific gene expression profiles in human cumulus cells depending on the type of gonadotropin used. TRIAL REGISTRATION Registered at clinicaltrials.gov; identifier NCT022437032.
Collapse
|
5
|
Sexual Dimorphism and Aging in the Human Hyppocampus: Identification, Validation, and Impact of Differentially Expressed Genes by Factorial Microarray and Network Analysis. Front Aging Neurosci 2016; 8:229. [PMID: 27761111 PMCID: PMC5050216 DOI: 10.3389/fnagi.2016.00229] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 09/14/2016] [Indexed: 01/09/2023] Open
Abstract
Motivation: In the brain of elderly-healthy individuals, the effects of sexual dimorphism and those due to normal aging appear overlapped. Discrimination of these two dimensions would powerfully contribute to a better understanding of the etiology of some neurodegenerative diseases, such as “sporadic” Alzheimer. Methods: Following a system biology approach, top-down and bottom-up strategies were combined. First, public transcriptome data corresponding to the transition from adulthood to the aging stage in normal, human hippocampus were analyzed through an optimized microarray post-processing (Q-GDEMAR method) together with a proper experimental design (full factorial analysis). Second, the identified genes were placed in context by building compatible networks. The subsequent ontology analyses carried out on these networks clarify the main functionalities involved. Results: Noticeably we could identify large sets of genes according to three groups: those that exclusively depend on the sex, those that exclusively depend on the age, and those that depend on the particular combinations of sex and age (interaction). The genes identified were validated against three independent sources (a proteomic study of aging, a senescence database, and a mitochondrial genetic database). We arrived to several new inferences about the biological functions compromised during aging in two ways: by taking into account the sex-independent effects of aging, and considering the interaction between age and sex where pertinent. In particular, we discuss the impact of our findings on the functions of mitochondria, autophagy, mitophagia, and microRNAs. Conclusions: The evidence obtained herein supports the occurrence of significant neurobiological differences in the hippocampus, not only between adult and elderly individuals, but between old-healthy women and old-healthy men. Hence, to obtain realistic results in further analysis of the transition from the normal aging to incipient Alzheimer, the features derived from the sexual dimorphism in hippocampus should be explicitly considered.
Collapse
|
6
|
SMARCA4/BRG1 Is a Novel Prognostic Biomarker Predictive of Cisplatin-Based Chemotherapy Outcomes in Resected Non-Small Cell Lung Cancer. Clin Cancer Res 2015; 22:2396-404. [PMID: 26671993 DOI: 10.1158/1078-0432.ccr-15-1468] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Accepted: 12/06/2015] [Indexed: 01/18/2023]
Abstract
PURPOSE Identification of predictive biomarkers is critically needed to improve selection of patients who derive the most benefit from platinum-based chemotherapy. We hypothesized that decreased expression of SMARCA4/BRG1, a known regulator of transcription and DNA repair, is a novel predictive biomarker of increased sensitivity to adjuvant platinum-based therapies in non-small cell lung cancer (NSCLC). EXPERIMENTAL DESIGN The prognostic value was tested using a gene-expression microarray from the Director's Challenge Lung Study (n = 440). The predictive significance of SMARCA4 was determined using a gene-expression microarray (n = 133) from control and treatment arms of the JBR.10 trial of adjuvant cisplatin/vinorelbine. Kaplan-Meier method and log-rank tests were used to estimate and test the differences of probabilities in overall survival (OS) and disease-specific survival (DSS) between expression groups and treatment arms. Multivariate Cox regression models were used while adjusting for other clinical covariates. RESULTS In the Director's Challenge Study, reduced expression of SMARCA4 was associated with poor OS compared with high and intermediate expression (P < 0.001 and P = 0.009, respectively). In multivariate analysis, compared with low, high SMARCA4 expression predicted a decrease in risk of death [HR, 0.6; 95% confidence interval (CI), 0.4-0.8; P = 0.002]. In the JBR.10 trial, improved 5-year DSS was noted only in patients with low SMARCA4 expression when treated with adjuvant cisplatin/vinorelbine [HR, 0.1; 95% CI, 0.0-0.5, P = 0.002 (low); HR, 1.0; 95% CI, 0.5-2.3, P = 0.92 (high)]. An interaction test was highly significant (P = 0.01). CONCLUSIONS Low expression of SMARCA4/BRG1 is significantly associated with worse prognosis; however, it is a novel significant predictive biomarker for increased sensitivity to platinum-based chemotherapy in NSCLC. Clin Cancer Res; 22(10); 2396-404. ©2015 AACR.
Collapse
|
7
|
Microarray experiments and factors which affect their reliability. Biol Direct 2015; 10:46. [PMID: 26335588 PMCID: PMC4559324 DOI: 10.1186/s13062-015-0077-2] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 08/24/2015] [Indexed: 12/12/2022] Open
Abstract
Oligonucleotide microarrays belong to the basic tools of molecular biology and allow for simultaneous assessment of the expression level of thousands of genes. Analysis of microarray data is however very complex, requiring sophisticated methods to control for various factors that are inherent to the procedures used. In this article we describe the individual steps of a microarray experiment, highlighting important elements and factors that may affect the processes involved and that influence the interpretation of the results. Additionally, we describe methods that can be used to estimate the influence of these factors, and to control the way in which they affect the expression estimates. A comprehensive understanding of the experimental protocol used in a microarray experiment aids the interpretation of the obtained results. By describing known factors which affect expression estimates this article provides guidelines for appropriate quality control and pre-processing of the data, additionally applicable to other transcriptome analysis methods that utilize similar sample handling protocols.
Collapse
|
8
|
Analysis of discordant Affymetrix probesets casts serious doubt on idea of microarray data reutilization. BMC Genomics 2014; 15 Suppl 12:S8. [PMID: 25563078 PMCID: PMC4303952 DOI: 10.1186/1471-2164-15-s12-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Affymetrix microarray technology allows one to investigate expression of thousands of genes simultaneously upon a variety of conditions. In a popular U133A microarray platform, the expression of 37% of genes is measured by more than one probeset. The discordant expression observed for two different probesets that match the same gene is a widespread phenomenon which is usually underestimated, ignored or disregarded. Results Here we evaluate the prevalence of discordant expression in data collected using Affymetrix HG-U133A microarray platform. In U133A, about 30% of genes annotated by two different probesets demonstrate a substantial correlation between independently measured expression values. To our surprise, sorting the probesets according to the nature of the discrepancy in their expression levels allowed the classification of the respective genes according to their fundamental functional properties, including observed enrichment by tissue-specific transcripts and alternatively spliced variants. On another hand, an absence of discrepancies in probesets that simultaneously match several different genes allowed us to pinpoint non-expressed pseudogenes and gene groups with highly correlated expression patterns. Nevertheless, in many cases, the nature of discordant expression of two probesets that match the same transcript remains unexplained. It is possible that these probesets report differently regulated sets of transcripts, or, in best case scenario, two different sets of transcripts that represent the same gene. Conclusion The majority of absolute gene expression values collected using Affymetrix microarrays may not be suitable for typical interpretative downstream analysis.
Collapse
|
9
|
Sources of high variance between probe signals in Affymetrix short oligonucleotide microarrays. SENSORS 2013; 14:532-48. [PMID: 24385030 PMCID: PMC3926573 DOI: 10.3390/s140100532] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 12/16/2013] [Accepted: 12/24/2013] [Indexed: 01/21/2023]
Abstract
High density oligonucleotide microarrays present a big challenge for statistical data processing methods which aim to separate changes induced by experimental factors from those caused by artifacts and measurement inaccuracies. Despite huge advances in the field of microarray probe design methods, the signal variation between probes that target a single transcript is substantially larger than their between-replicate array variability, suggesting a large influence of various probe-specific effects that introduce bias to the data. In this work we present the influence of probe-related design variations on the expression intensities of individual probes, focusing on five potential sources of high probe signal variance: the GC composition of the probe, the distance between individual probe target sites, G-quadruplex formation in the probe sequence, the occurrence of sequence motifs complementary to the oligo(dT) primer, and the specificity of unrecognized alternative splicing probeset assignment. By focusing on two high quality microarray datasets based on two distinct array designs we show the extent of variance between probes that target a specific transcript providing guidelines for the future design of microarrays and data processing methods.
Collapse
|
10
|
Impact of heat shock transcription factor 1 on global gene expression profiles in cells which induce either cytoprotective or pro-apoptotic response following hyperthermia. BMC Genomics 2013; 14:456. [PMID: 23834426 PMCID: PMC3711851 DOI: 10.1186/1471-2164-14-456] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 07/01/2013] [Indexed: 11/23/2022] Open
Abstract
Background Elevated temperatures induce activation of the heat shock transcription factor 1 (HSF1) which in somatic cells leads to heat shock proteins synthesis and cytoprotection. However, in the male germ cells (spermatocytes) caspase-3 dependent apoptosis is induced upon HSF1 activation and spermatogenic cells are actively eliminated. Results To elucidate a mechanism of such diverse HSF1 activity we carried out genome-wide transcriptional analysis in control and heat-shocked cells, either spermatocytes or hepatocytes. Additionally, to identify direct molecular targets of active HSF1 we used chromatin immunoprecipitation assay (ChIP) combined with promoter microarrays (ChIP on chip). Genes that are differently regulated after HSF1 binding during hyperthermia in both types of cells have been identified. Despite HSF1 binding to promoter sequences in both types of cells, strong up-regulation of Hsps and other genes typically activated by the heat shock was observed only in hepatocytes. In spermatocytes HSF1 binding correlates with transcriptional repression on a large scale. HSF1-bound and negatively regulated genes encode mainly for proteins required for cell division, involved in RNA processing and piRNA biogenesis. Conclusions Observed suppression of the transcription could lead to genomic instability caused by meiotic recombination disturbances, which in turn might induce apoptosis of spermatogenic cells. We propose that HSF1-dependent induction of cell death is caused by the simultaneous repression of many genes required for spermatogenesis, which guarantees the elimination of cells damaged during heat shock. Such activity of HSF1 prevents transmission of damaged genetic material to the next generation.
Collapse
|
11
|
Knowledge Driven Variable Selection (KDVS) - a new approach to enrichment analysis of gene signatures obtained from high-throughput data. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013; 8:2. [PMID: 23302187 PMCID: PMC3605163 DOI: 10.1186/1751-0473-8-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 12/13/2012] [Indexed: 11/10/2022]
Abstract
Background High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. Results We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. Conclusions We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches.
Collapse
|
12
|
Abstract
Many platforms for genome-wide analysis of gene expression contain ‘redundant’ measures for the same gene. For example, the most highly utilized platforms for gene expression microarrays, Affymetrix GeneChip® arrays, have as many as ten or more probe sets for some genes. Occasionally, individual probe sets for the same gene report different trends in expression across experimental conditions, a situation that must be resolved in order to accurately interpret the data. We developed an algorithm, SCOREM, for determining the level of agreement between such probe sets, utilizing a statistical test of concordance, Kendall's W coefficient of concordance, and a graph-searching algorithm for the identification of concordant probe sets. We also present methods for consolidating concordant groups into a single value for its corresponding gene and for post hoc analysis of discordant groups. By combining statistical consolidation with sequence analysis, SCOREM possesses the unique ability to identify biologically meaningful discordant behaviors, including differing behaviors in alternate RNA isoforms and tissue-specific patterns of expression. When consolidating concordant behaviors, SCOREM outperforms other methods in detecting both differential expression and overrepresented functional categories.
Collapse
|
13
|
Integrating multiple microarray datasets on oral squamous cell carcinoma to reveal dysregulated networks. Head Neck 2011; 34:1789-97. [PMID: 22179951 DOI: 10.1002/hed.22013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2011] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Oral squamous cell carcinoma (OSCC) is the sixth most common type of carcinoma worldwide. The pathogenic pathways involved in this cancer are mostly unknown; therefore, a better characterization of the OSCC gene expression profile would represent a considerable advance. The public availability of gene expression datasets was meant to obtain new insights on biological processes. METHODS We integrated 4 public microarray datasets on OSCC to evaluate the degree of consistency among the biological results obtained in these different studies and to identify common regulatory pathways that could be responsible for tumor growth. RESULTS Twelve altered cellular pathways implicated in OSCC and 4 genes altered in the extracellular matrix (ECM) receptor pathway were validated by quantitative real-time polymerase chain reaction (qRT-PCR). CONCLUSION Using 4 expression array datasets, we have developed a robust method for analyzing pathways altered in OSCC.
Collapse
|
14
|
Bystander Effects Induced by Medium From Irradiated Cells: Similar Transcriptome Responses in Irradiated and Bystander K562 Cells. Int J Radiat Oncol Biol Phys 2010; 77:244-52. [DOI: 10.1016/j.ijrobp.2009.11.033] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 11/11/2009] [Accepted: 11/11/2009] [Indexed: 11/30/2022]
|
15
|
SplicerAV: a tool for mining microarray expression data for changes in RNA processing. BMC Bioinformatics 2010; 11:108. [PMID: 20184770 PMCID: PMC2838864 DOI: 10.1186/1471-2105-11-108] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 02/25/2010] [Indexed: 12/22/2022] Open
Abstract
Background Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. Results Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. Conclusions Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.
Collapse
|