1
|
Pradines JR, Farutin V, Cilfone NA, Ghavami A, Kurtagic E, Guess J, Manning AM, Capila I. Enhancing reproducibility of gene expression analysis with known protein functional relationships: The concept of well-associated protein. PLoS Comput Biol 2020; 16:e1007684. [PMID: 32058996 PMCID: PMC7046299 DOI: 10.1371/journal.pcbi.1007684] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Revised: 02/27/2020] [Accepted: 01/27/2020] [Indexed: 12/27/2022] Open
Abstract
Identification of differentially expressed genes (DEGs) is well recognized to be variable across independent replications of genome-wide transcriptional studies. These are often employed to characterize disease state early in the process of discovery and prioritize novel targets aimed at addressing unmet medical need. Increasing reproducibility of biological findings from these studies could potentially positively impact the success rate of new clinical interventions. This work demonstrates that statistically sound combination of gene expression data with prior knowledge about biology in the form of large protein interaction networks can yield quantitatively more reproducible observations from studies characterizing human disease. The novel concept of Well-Associated Proteins (WAPs) introduced herein-gene products significantly associated on protein interaction networks with the differences in transcript levels between control and disease-does not require choosing a differential expression threshold and can be computed efficiently enough to enable false discovery rate estimation via permutation. Reproducibility of WAPs is shown to be on average superior to that of DEGs under easily-quantifiable conditions suggesting that they can yield a significantly more robust description of disease. Enhanced reproducibility of WAPs versus DEGs is first demonstrated with four independent data sets focused on systemic sclerosis. This finding is then validated over thousands of pairs of data sets obtained by random partitions of large studies in several other diseases. Conditions that individual data sets must satisfy to yield robust WAP scores are examined. Reproducible identification of WAPs can potentially benefit drug target selection and precision medicine studies.
Collapse
Affiliation(s)
- Joël R. Pradines
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
| | - Victor Farutin
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
- * E-mail: (VF); (IC)
| | - Nicholas A. Cilfone
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
| | - Abouzar Ghavami
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
| | - Elma Kurtagic
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
| | - Jamey Guess
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
| | - Anthony M. Manning
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
| | - Ishan Capila
- Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America
- * E-mail: (VF); (IC)
| |
Collapse
|
2
|
Node-Structured Integrative Gaussian Graphical Model Guided by Pathway Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:8520480. [PMID: 28487748 PMCID: PMC5405575 DOI: 10.1155/2017/8520480] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/20/2017] [Accepted: 03/06/2017] [Indexed: 12/23/2022]
Abstract
Up to date, many biological pathways related to cancer have been extensively applied thanks to outputs of burgeoning biomedical research. This leads to a new technical challenge of exploring and validating biological pathways that can characterize transcriptomic mechanisms across different disease subtypes. In pursuit of accommodating multiple studies, the joint Gaussian graphical model was previously proposed to incorporate nonzero edge effects. However, this model is inevitably dependent on post hoc analysis in order to confirm biological significance. To circumvent this drawback, we attempt not only to combine transcriptomic data but also to embed pathway information, well-ascertained biological evidence as such, into the model. To this end, we propose a novel statistical framework for fitting joint Gaussian graphical model simultaneously with informative pathways consistently expressed across multiple studies. In theory, structured nodes can be prespecified with multiple genes. The optimization rule employs the structured input-output lasso model, in order to estimate a sparse precision matrix constructed by simultaneous effects of multiple studies and structured nodes. With an application to breast cancer data sets, we found that the proposed model is superior in efficiently capturing structures of biological evidence (e.g., pathways). An R software package nsiGGM is publicly available at author's webpage.
Collapse
|
3
|
Parfett C, Williams A, Zheng J, Zhou G. Gene batteries and synexpression groups applied in a multivariate statistical approach to dose–response analysis of toxicogenomic data. Regul Toxicol Pharmacol 2013; 67:63-74. [DOI: 10.1016/j.yrtph.2013.06.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 06/26/2013] [Indexed: 12/28/2022]
|
4
|
Maglietta R, Liuzzi VC, Cattaneo E, Laczko E, Piepoli A, Panza A, Carella M, Palumbo O, Staiano T, Buffoli F, Andriulli A, Marra G, Ancona N. Molecular pathways undergoing dramatic transcriptomic changes during tumor development in the human colon. BMC Cancer 2012; 12:608. [PMID: 23253212 PMCID: PMC3541196 DOI: 10.1186/1471-2407-12-608] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2012] [Accepted: 12/13/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The malignant transformation of precancerous colorectal lesions involves progressive alterations at both the molecular and morphologic levels, the latter consisting of increases in size and in the degree of cellular atypia. Analyzing preinvasive tumors of different sizes can therefore shed light on the sequence of these alterations. METHODS We used a molecular pathway-based approach to analyze transcriptomic profiles of 59 colorectal tumors representing early and late preinvasive stages and the invasive stage of tumorigenesis. Random set analysis was used to identify biological pathways enriched for genes differentially regulated in tumors (compared with 59 samples of normal mucosa). RESULTS Of the 880 canonical pathways we investigated, 112 displayed significant tumor-related upregulation or downregulation at one or more stages of tumorigenesis. This allowed us to distinguish between pathways whose dysregulation is probably necessary throughout tumorigenesis and those whose involvement specifically drives progression from one stage to the next. We were also able to pinpoint specific changes within each gene set that seem to play key roles at each transition. The early preinvasive stage was characterized by cell-cycle checkpoint activation triggered by DNA replication stress and dramatic downregulation of basic transmembrane signaling processes that maintain epithelial/stromal homeostasis in the normal mucosa. In late preinvasive lesions, there was also downregulation of signal transduction pathways (e.g., those mediated by G proteins and nuclear hormone receptors) involved in cell differentiation and upregulation of pathways governing nuclear envelope dynamics and the G2>M transition in the cell cycle. The main features of the invasive stage were activation of the G1>S transition in the cell cycle, upregulated expression of tumor-promoting microenvironmental factors, and profound dysregulation of metabolic pathways (e.g., increased aerobic glycolysis, downregulation of pathways that metabolize drugs and xenobiotics). CONCLUSIONS Our analysis revealed specific pathways whose dysregulation might play a role in each transition of the transformation process. This is the first study in which such an approach has been used to gain further insights into colorectal tumorigenesis. Therefore, these data provide a launchpad for further exploration of the molecular characterization of colorectal tumorigenesis using systems biology approaches.
Collapse
Affiliation(s)
- Rosalia Maglietta
- Istituto di Studi sui Sistemi Intelligenti per l'Automazione - CNR, Via Amendola 122/D-I, 70126 Bari, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Piepoli A, Palmieri O, Maglietta R, Panza A, Cattaneo E, Latiano A, Laczko E, Gentile A, Carella M, Mazzoccoli G, Ancona N, Marra G, Andriulli A. The expression of leucine-rich repeat gene family members in colorectal cancer. Exp Biol Med (Maywood) 2012; 237:1123-8. [PMID: 23045723 DOI: 10.1258/ebm.2012.012042] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
This study was conducted to evaluate the association of the leucine-rich repeat (LRR) gene family with colorectal cancer (CRC). The expression of members of the LRR gene family were analyzed in 17 CRC specimens and in 59 healthy colorectal tissues by using Human Exon1.0ST microarray, and in 25 CRC specimens and 32 healthy colorectal tissues by U133Plus2.0 microarray. An association was found for 25 genes belonging to the plant-specific (PS) class of LRR genes (P = 0.05 for Exon1.0 ST and P = 0.04 for U133Plus2.0). In both data-sets, in CRC, we found down-regulation of SHOC2 (P < 0.00003) and LRRC28 (P < 0.01) and up-regulation of LRSAM1 (P < 0.000001), while up-regulation of MFHAS1 (P = 0.0005) and down-regulation of WDFY3 (P = 0.026) were found only in the Exon1.0 ST data-set. The PS LLR gene class encodes proteins that activate immune cells and might play a key role in programmed cell death and autophagy. SHOC2 and LRRC28 genes involved in RAS-mediated signaling, which hinders nutrient deprivation-induced autophagy, might be a possible link between the negative control of autophagy and tumorigenesis.
Collapse
Affiliation(s)
- Ada Piepoli
- Laboratory and Division of Gastroenterology, IRCCS Casa Sollievo della Sofferenza Hospital, Viale Cappuccini n.1, San Giovanni Rotondo (FG), Italy.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
A predictive framework for integrating disparate genomic data types using sample-specific gene set enrichment analysis and multi-task learning. PLoS One 2012; 7:e44635. [PMID: 23028573 PMCID: PMC3441565 DOI: 10.1371/journal.pone.0044635] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 08/06/2012] [Indexed: 11/19/2022] Open
Abstract
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.
Collapse
|
7
|
Danielsen SA, Cekaite L, Ågesen TH, Sveen A, Nesbakken A, Thiis-Evensen E, Skotheim RI, Lind GE, Lothe RA. Phospholipase C isozymes are deregulated in colorectal cancer--insights gained from gene set enrichment analysis of the transcriptome. PLoS One 2011; 6:e24419. [PMID: 21909432 PMCID: PMC3164721 DOI: 10.1371/journal.pone.0024419] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 08/10/2011] [Indexed: 12/19/2022] Open
Abstract
Colorectal cancer (CRC) is one of the most common cancer types in developed countries. To identify molecular networks and biological processes that are deregulated in CRC compared to normal colonic mucosa, we applied Gene Set Enrichment Analysis to two independent transcriptome datasets, including a total of 137 CRC and ten normal colonic mucosa samples. Eighty-two gene sets as described by the Kyoto Encyclopedia of Genes and Genomes database had significantly altered gene expression in both datasets. These included networks associated with cell division, DNA maintenance, and metabolism. Among signaling pathways with known changes in key genes, the “Phosphatidylinositol signaling network”, comprising part of the PI3K pathway, was found deregulated. The downregulated genes in this pathway included several members of the Phospholipase C protein family, and the reduced expression of two of these, PLCD1 and PLCE1, were successfully validated in CRC biopsies (n = 70) and cell lines (n = 19) by quantitative analyses. The repression of both genes was found associated with KRAS mutations (P = 0.005 and 0.006, respectively), and we observed that microsatellite stable carcinomas with reduced PLCD1 expression more frequently had TP53 mutations (P = 0.002). Promoter methylation analyses of PLCD1 and PLCE1 performed in cell lines and tumor biopsies revealed that methylation of PLCD1 can contribute to reduced expression in 40% of the microsatellite instable carcinomas. In conclusion, we have identified significantly deregulated pathways in CRC, and validated repression of PLCD1 and PLCE1 expression. This illustrates that the GSEA approach may guide discovery of novel biomarkers in cancer.
Collapse
Affiliation(s)
- Stine A. Danielsen
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Lina Cekaite
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Trude H. Ågesen
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Anita Sveen
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Arild Nesbakken
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
- Department of Gastrointestinal Surgery, Oslo University Hospital, Oslo, Norway
| | - Espen Thiis-Evensen
- Department of Organ Transplantation, Gastroenterology, and Nephrology, Oslo University Hospital, Oslo, Norway
| | - Rolf I. Skotheim
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Guro E. Lind
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Ragnhild A. Lothe
- Department of Cancer Prevention, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
- * E-mail:
| |
Collapse
|
8
|
Azuaje F, Zheng H, Camargo A, Wang H. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease. J Biomed Inform 2011; 44:637-47. [PMID: 21315182 DOI: 10.1016/j.jbi.2011.02.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2010] [Revised: 01/31/2011] [Accepted: 02/07/2011] [Indexed: 01/13/2023]
Abstract
The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms.
Collapse
Affiliation(s)
- Francisco Azuaje
- Laboratory of Cardiovascular Research, Public Research Centre for Health (CRP-Santé), 120 Route d'Arlon L-1150, Luxembourg.
| | | | | | | |
Collapse
|
9
|
Lussier YA, Butte AJ, Hunter L. Current methodologies for translational bioinformatics. J Biomed Inform 2010; 43:355-7. [PMID: 20470899 DOI: 10.1016/j.jbi.2010.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Revised: 05/06/2010] [Accepted: 05/06/2010] [Indexed: 10/19/2022]
|