1
|
Riquelme-Perez M, Perez-Sanz F, Deleuze JF, Escartin C, Bonnet E, Brohard S. DEVEA: an interactive shiny application for Differential Expression analysis, data Visualization and Enrichment Analysis of transcriptomics data. F1000Res 2023; 11:711. [PMID: 36999088 PMCID: PMC10043628.2 DOI: 10.12688/f1000research.122949.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/21/2023] [Indexed: 03/29/2023] Open
Abstract
We are at a time of considerable growth in transcriptomics studies and subsequent in silico analysis. RNA sequencing (RNA-Seq) is the most widely used approach to analyse the transcriptome and is integrated in many studies. The processing of transcriptomic data typically requires a noteworthy number of steps, statistical knowledge, and coding skills, which are not accessible to all scientists. Despite the development of a plethora of software applications over the past few years to address this concern, there is still room for improvement. Here we present DEVEA, an R shiny application tool developed to perform differential expression analysis, data visualization and enrichment pathway analysis mainly from transcriptomics data, but also from simpler gene lists with or without statistical values. The intuitive and easy-to-manipulate interface facilitates gene expression exploration through numerous interactive figures and tables, and statistical comparisons of expression profile levels between groups. Further meta-analysis such as enrichment analysis is also possible, without the need for prior bioinformatics expertise. DEVEA performs a comprehensive analysis from multiple and flexible data sources representing distinct analytical steps. Consequently, it produces dynamic graphs and tables, to explore the expression levels and statistical results from differential expression analysis. Moreover, it generates a comprehensive pathway analysis to extend biological insights. Finally, a complete and customizable HTML report can be extracted to enable the scientists to explore results beyond the application. DEVEA is freely accessible at https://shiny.imib.es/devea/ and the source code is available on our GitHub repository https://github.com/MiriamRiquelmeP/DEVEA.
Collapse
Affiliation(s)
- Miriam Riquelme-Perez
- Université Paris-Saclay, CEA, CNRS, MIRCen, Laboratoire des Maladies Neurodégénératives, Fontenay-aux-Roses, 92265, France
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| | - Fernando Perez-Sanz
- Biomedical Informatics & Bioinformatics Service, Institute for Biomedical Research of Murcia (IMIB), Murcia, 30120, Spain
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| | - Carole Escartin
- Université Paris-Saclay, CEA, CNRS, MIRCen, Laboratoire des Maladies Neurodégénératives, Fontenay-aux-Roses, 92265, France
| | - Eric Bonnet
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| | - Solène Brohard
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| |
Collapse
|
2
|
Riquelme-Perez M, Perez-Sanz F, Deleuze JF, Escartin C, Bonnet E, Brohard S. DEVEA: an interactive shiny application for Differential Expression analysis, data Visualization and Enrichment Analysis of transcriptomics data. F1000Res 2022; 11:711. [PMID: 36999088 PMCID: PMC10043628 DOI: 10.12688/f1000research.122949.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/20/2022] [Indexed: 11/20/2022] Open
Abstract
We are at a time of considerable growth in the use and development of transcriptomics studies and subsequent in silico analysis. RNA sequencing is one of the most widely used approaches, now integrated in many studies. The processing of these data may typically require a noteworthy number of steps, statistical knowledge, and coding skills which is not accessible to all scientists. Despite the undeniable development of software applications over the years to address this concern, it is still possible to improve. Here we present DEVEA, an R shiny application tool developed to perform differential expression analysis, data visualization and enrichment pathway analysis mainly from transcriptomics data, but also from simpler gene lists with or without statistical values. Its intuitive and easy-to-manipulate interface facilitates gene expression exploration through numerous interactive figures and tables, statistical comparisons of expression profile levels between groups and further meta-analysis such as enrichment analysis, without bioinformatics expertise. DEVEA performs a thorough analysis from multiple and flexible input data representing distinct analysis stages. From them, it produces dynamic graphs and tables, to explore the expression levels and statistical differential expression analysis results. Moreover, it generates a comprehensive pathway analysis to extend biological insights. Finally, a complete and customizable HTML report can be extracted for further result exploration outside the application. DEVEA is accessible at https://shiny.imib.es/devea/ and the source code is available on our GitHub repository https://github.com/MiriamRiquelmeP/DEVEA.
Collapse
Affiliation(s)
- Miriam Riquelme-Perez
- Université Paris-Saclay, CEA, CNRS, MIRCen, Laboratoire des Maladies Neurodégénératives, Fontenay-aux-Roses, 92265, France
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| | - Fernando Perez-Sanz
- Biomedical Informatics & Bioinformatics Service, Institute for Biomedical Research of Murcia (IMIB), Murcia, 30120, Spain
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| | - Carole Escartin
- Université Paris-Saclay, CEA, CNRS, MIRCen, Laboratoire des Maladies Neurodégénératives, Fontenay-aux-Roses, 92265, France
| | - Eric Bonnet
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| | - Solène Brohard
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, 91000, Evry, France
| |
Collapse
|
3
|
Park C, Kim B, Park T. DeepHisCoM: deep learning pathway analysis using hierarchical structural component models. Brief Bioinform 2022; 23:6590446. [DOI: 10.1093/bib/bbac171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/04/2022] [Accepted: 04/18/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Many statistical methods for pathway analysis have been used to identify pathways associated with the disease along with biological factors such as genes and proteins. However, most pathway analysis methods neglect the complex nonlinear relationship between biological factors and pathways. In this study, we propose a Deep-learning pathway analysis using Hierarchical structured CoMponent models (DeepHisCoM) that utilize deep learning to consider a nonlinear complex contribution of biological factors to pathways by constructing a multilayered model which accounts for hierarchical biological structure. Through simulation studies, DeepHisCoM was shown to have a higher power in the nonlinear pathway effect and comparable power for the linear pathway effect when compared to the conventional pathway methods. Application to hepatocellular carcinoma (HCC) omics datasets, including metabolomic, transcriptomic and metagenomic datasets, demonstrated that DeepHisCoM successfully identified three well-known pathways that are highly associated with HCC, such as lysine degradation, valine, leucine and isoleucine biosynthesis and phenylalanine, tyrosine and tryptophan. Application to the coronavirus disease-2019 (COVID-19) single-nucleotide polymorphism (SNP) dataset also showed that DeepHisCoM identified four pathways that are highly associated with the severity of COVID-19, such as mitogen-activated protein kinase (MAPK) signaling pathway, gonadotropin-releasing hormone (GnRH) signaling pathway, hypertrophic cardiomyopathy and dilated cardiomyopathy. Codes are available at https://github.com/chanwoo-park-official/DeepHisCoM.
Collapse
Affiliation(s)
- Chanwoo Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Boram Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
4
|
NMR in Metabolomics: From Conventional Statistics to Machine Learning and Neural Network Approaches. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062824] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
NMR measurements combined with chemometrics allow achieving a great amount of information for the identification of potential biomarkers responsible for a precise metabolic pathway. These kinds of data are useful in different fields, ranging from food to biomedical fields, including health science. The investigation of the whole set of metabolites in a sample, representing its fingerprint in the considered condition, is known as metabolomics and may take advantage of different statistical tools. The new frontier is to adopt self-learning techniques to enhance clustering or classification actions that can improve the predictive power over large amounts of data. Although machine learning is already employed in metabolomics, deep learning and artificial neural networks approaches were only recently successfully applied. In this work, we give an overview of the statistical approaches underlying the wide range of opportunities that machine learning and neural networks allow to perform with accurate metabolites assignment and quantification.Various actual challenges are discussed, such as proper metabolomics, deep learning architectures and model accuracy.
Collapse
|
5
|
Nguyen H, Tran D, Galazka JM, Costes SV, Beheshti A, Petereit J, Draghici S, Nguyen T. CPA: a web-based platform for consensus pathway analysis and interactive visualization. Nucleic Acids Res 2021; 49:W114-W124. [PMID: 34037798 PMCID: PMC8262702 DOI: 10.1093/nar/gkab421] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/16/2021] [Accepted: 05/05/2021] [Indexed: 01/06/2023] Open
Abstract
In molecular biology and genetics, there is a large gap between the ease of data collection and our ability to extract knowledge from these data. Contributing to this gap is the fact that living organisms are complex systems whose emerging phenotypes are the results of multiple complex interactions taking place on various pathways. This demands powerful yet user-friendly pathway analysis tools to translate the now abundant high-throughput data into a better understanding of the underlying biological phenomena. Here we introduce Consensus Pathway Analysis (CPA), a web-based platform that allows researchers to (i) perform pathway analysis using eight established methods (GSEA, GSA, FGSEA, PADOG, Impact Analysis, ORA/Webgestalt, KS-test, Wilcox-test), (ii) perform meta-analysis of multiple datasets, (iii) combine methods and datasets to accurately identify the impacted pathways underlying the studied condition and (iv) interactively explore impacted pathways, and browse relationships between pathways and genes. The platform supports three types of input: (i) a list of differentially expressed genes, (ii) genes and fold changes and (iii) an expression matrix. It also allows users to import data from NCBI GEO. The CPA platform currently supports the analysis of multiple organisms using KEGG and Gene Ontology, and it is freely available at http://cpa.tinnguyen-lab.com.
Collapse
Affiliation(s)
- Hung Nguyen
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| | - Duc Tran
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| | - Jonathan M Galazka
- NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Sylvain V Costes
- NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Afshin Beheshti
- KBR, NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Juli Petereit
- University of Nevada Reno, Nevada Bioinformatics Center, Reno, NV 89557, USA
| | - Sorin Draghici
- Wayne State University, Department of Computer Science, Detroit, MI 48202, USA
| | - Tin Nguyen
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| |
Collapse
|
6
|
Bi G, Bian Y, Liang J, Yin J, Li R, Zhao M, Huang Y, Lu T, Zhan C, Fan H, Wang Q. Pan-cancer characterization of metabolism-related biomarkers identifies potential therapeutic targets. J Transl Med 2021; 19:219. [PMID: 34030708 PMCID: PMC8142489 DOI: 10.1186/s12967-021-02889-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 05/17/2021] [Indexed: 02/07/2023] Open
Abstract
Background Generally, cancer cells undergo metabolic reprogramming to adapt to energetic and biosynthetic requirements that support their uncontrolled proliferation. However, the mutual relationship between two critical metabolic pathways, glycolysis and oxidative phosphorylation (OXPHOS), remains poorly defined. Methods We developed a “double-score” system to quantify glycolysis and OXPHOS in 9668 patients across 33 tumor types from The Cancer Genome Atlas and classified them into four metabolic subtypes. Multi-omics bioinformatical analyses was conducted to detect metabolism-related molecular features. Results Compared with patients with low glycolysis and high OXPHOS (LGHO), those with high glycolysis and low OXPHOS (HGLO) were consistently associated with worse prognosis. We identified common dysregulated molecular features between different metabolic subgroups across multiple cancers, including gene, miRNA, transcription factor, methylation, and somatic alteration, as well as investigated their mutual interfering relationships. Conclusion Overall, this work provides a comprehensive atlas of metabolic heterogeneity on a pan-cancer scale and identified several potential drivers of metabolic rewiring, suggesting corresponding prognostic and therapeutic utility. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-02889-0.
Collapse
Affiliation(s)
- Guoshu Bi
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Yunyi Bian
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Jiaqi Liang
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Jiacheng Yin
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Runmei Li
- Department of Biostatistics, Public Health, Fudan University, Shanghai, 200000, China
| | - Mengnan Zhao
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Yiwei Huang
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Tao Lu
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| | - Cheng Zhan
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China.
| | - Hong Fan
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China.
| | - Qun Wang
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Rd, Xuhui District, Shanghai, 200032, China
| |
Collapse
|
7
|
Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front Genet 2020; 11:654. [PMID: 32695141 PMCID: PMC7339292 DOI: 10.3389/fgene.2020.00654] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022] Open
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|
8
|
Tripathi H, Mukhopadhyay S, Mohapatra SK. Sepsis-associated pathways segregate cancer groups. BMC Cancer 2020; 20:309. [PMID: 32293345 PMCID: PMC7160985 DOI: 10.1186/s12885-020-06774-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 03/23/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Sepsis and cancer are both leading causes of death, and occurrence of any one, increases the likelihood of the other. While cancer patients are susceptible to sepsis, survivors of sepsis are also susceptible to develop certain cancers. This mutual dependence for susceptibility suggests shared biology between the two disease categories. Earlier analysis had revealed a cancer-related pathway to be up-regulated in Septic Shock (SS), an advanced stage of sepsis. This has motivated a more comprehensive comparison of the transcriptomes of SS and cancer. METHODS Gene Set Enrichment Analysis was performed to detect the pathways enriched in SS and cancer. Thereafter, hierarchical clustering was applied to identify relative segregation of 17 cancer types into two groups vis-a-vis SS. Biological significance of the selected pathways was explored by network analysis. Clinical significance of the pathways was tested by survival analysis. A robust classifier of cancer groups was developed based on machine learning. RESULTS A total of 66 pathways were observed to be enriched in both SS and cancer. However, clustering segregated cancer types into two categories based on the direction of transcriptomic change. In general, there was up-regulation in SS and one group of cancer (termed Sepsis-Like Cancer, or SLC), but not in other cancers (termed Cancer Alone, or CA). The SLC group mainly consisted of malignancies of the gastrointestinal tract (head and neck, oesophagus, stomach, liver and biliary system) often associated with infection. Machine learning classifier successfully segregated the two cancer groups with high accuracy (> 98%). Additionally, pathway up-regulation was observed to be associated with survival in the SLC group of cancers. CONCLUSION Transcriptome-based systems biology approach segregates cancer into two groups (SLC and CA) based on similarity with SS. Host response to infection plays a key role in pathogenesis of SS and SLC. However, we hypothesize that some component of the host response is protective in both SS and SLC.
Collapse
Affiliation(s)
- Himanshu Tripathi
- National Institute of Biomedical Genomics, P.O. NSS, Kalyani, Nadia, West Bengal, 741251, India
| | - Samanwoy Mukhopadhyay
- National Institute of Biomedical Genomics, P.O. NSS, Kalyani, Nadia, West Bengal, 741251, India
| | - Saroj Kant Mohapatra
- National Institute of Biomedical Genomics, P.O. NSS, Kalyani, Nadia, West Bengal, 741251, India.
| |
Collapse
|
9
|
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. ENTROPY 2020; 22:e22040427. [PMID: 33286201 PMCID: PMC7516904 DOI: 10.3390/e22040427] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/18/2020] [Accepted: 04/03/2020] [Indexed: 12/22/2022]
Abstract
Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
Collapse
|
10
|
Nguyen TM, Shafi A, Nguyen T, Draghici S. Identifying significantly impacted pathways: a comprehensive review and assessment. Genome Biol 2019; 20:203. [PMID: 31597578 PMCID: PMC6784345 DOI: 10.1186/s13059-019-1790-4] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 08/13/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true. RESULTS This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested. CONCLUSION Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.
Collapse
Affiliation(s)
- Tuan-Minh Nguyen
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557 USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, 48202 USA
| |
Collapse
|
11
|
Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2019. [DOI: 10.3390/make1030054] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence.
Collapse
|
12
|
Powers RK, Goodspeed A, Pielke-Lombardo H, Tan AC, Costello JC. GSEA-InContext: identifying novel and common patterns in expression experiments. Bioinformatics 2019; 34:i555-i564. [PMID: 29950010 PMCID: PMC6022535 DOI: 10.1093/bioinformatics/bty271] [Citation(s) in RCA: 134] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate pathway-level changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment. Results We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets to show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis. Availability and implementation GSEA-InContext implemented in Python, Supplementary results and the background expression compendium are available at: https://github.com/CostelloLab/GSEA-InContext.
Collapse
Affiliation(s)
- Rani K Powers
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Andrew Goodspeed
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Harrison Pielke-Lombardo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Aik-Choon Tan
- Department of Medical Oncology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - James C Costello
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
13
|
Li Y, Wu Y, Zhang X, Bai Y, Akthar LM, Lu X, Shi M, Zhao J, Jiang Q, Li Y. SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics. Front Genet 2019; 10:598. [PMID: 31293623 PMCID: PMC6603225 DOI: 10.3389/fgene.2019.00598] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 06/05/2019] [Indexed: 01/06/2023] Open
Abstract
Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.
Collapse
Affiliation(s)
- Yiqun Li
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ying Wu
- Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China
| | - Xiaohan Zhang
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yunfan Bai
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Luqman Muhammad Akthar
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xin Lu
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ming Shi
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jianxiang Zhao
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Li
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
14
|
Alaimo S, Micale G, La Ferlita A, Ferro A, Pulvirenti A. Computational Methods to Investigate the Impact of miRNAs on Pathways. Methods Mol Biol 2019; 1970:183-209. [PMID: 30963494 DOI: 10.1007/978-1-4939-9207-2_11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Pathway analysis is a wide class of methods allowing to determine the alteration of functional processes in complex diseases. However, biological pathways are still partial, and knowledge coming from posttranscriptional regulators has started to be considered in a systematic way only recently. Here we will give a global and updated view of the main pathway and subpathway analysis methodologies, focusing on the improvements obtained through the recent introduction of microRNAs as regulatory elements in these frameworks.
Collapse
Affiliation(s)
- Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Giovanni Micale
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | | | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy.
| |
Collapse
|
15
|
Wang S, Yuan M. Combined Hypothesis Testing on Graphs With Applications to Gene Set Enrichment Analysis. J Am Stat Assoc 2018; 114:1320-1338. [DOI: 10.1080/01621459.2018.1497501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Shulei Wang
- Department of Statistics, University of Wisconsin, Madison, WI
| | - Ming Yuan
- Department of Statistics, Columbia University, New York, NY
| |
Collapse
|
16
|
Jambusaria A, Klomp J, Hong Z, Rafii S, Dai Y, Malik AB, Rehman J. A computational approach to identify cellular heterogeneity and tissue-specific gene regulatory networks. BMC Bioinformatics 2018; 19:217. [PMID: 29940845 PMCID: PMC6019795 DOI: 10.1186/s12859-018-2190-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 05/04/2018] [Indexed: 01/26/2023] Open
Abstract
Background The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Results Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. Conclusions HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes. Electronic supplementary material The online version of this article (10.1186/s12859-018-2190-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ankit Jambusaria
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA.,Department of Bioengineering, The University of Illinois at Chicago, Chicago, IL, USA
| | - Jeff Klomp
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA
| | - Zhigang Hong
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA
| | - Shahin Rafii
- Division of Regenerative Medicine, Department of Medicine, Ansary Stem Cell Institute, Weill Cornell Medicine, New York, NY, USA
| | - Yang Dai
- Department of Bioengineering, The University of Illinois at Chicago, Chicago, IL, USA
| | - Asrar B Malik
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA.
| | - Jalees Rehman
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA. .,Department of Bioengineering, The University of Illinois at Chicago, Chicago, IL, USA. .,Division of Cardiology, Department of Medicine, The University of Illinois College of Medicine, Chicago, IL, USA.
| |
Collapse
|
17
|
Zhang Y, Topham DJ, Thakar J, Qiu X. FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis. Bioinformatics 2018; 33:1944-1952. [PMID: 28334094 PMCID: PMC5939227 DOI: 10.1093/bioinformatics/btx104] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 02/17/2017] [Indexed: 01/26/2023] Open
Abstract
Motivation Gene set enrichment analyses (GSEAs) are widely used in genomic research to identify underlying biological mechanisms (defined by the gene sets), such as Gene Ontology terms and molecular pathways. There are two caveats in the currently available methods: (i) they are typically designed for group comparisons or regression analyses, which do not utilize temporal information efficiently in time-series of transcriptomics measurements; and (ii) genes overlapping in multiple molecular pathways are considered multiple times in hypothesis testing. Results We propose an inferential framework for GSEA based on functional data analysis, which utilizes the temporal information based on functional principal component analysis, and disentangles the effects of overlapping genes by a functional extension of the elastic-net regression. Furthermore, the hypothesis testing for the gene sets is performed by an extension of Mann-Whitney U test which is based on weighted rank sums computed from correlated observations. By using both simulated datasets and a large-scale time-course gene expression data on human influenza infection, we demonstrate that our method has uniformly better receiver operating characteristic curves, and identifies more pathways relevant to immune-response to human influenza infection than the competing approaches. Availability and Implementation The methods are implemented in R package FUNNEL, freely and publicly available at: https://github.com/yunzhang813/FUNNEL-GSEA-R-Package. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Zhang
- Department of Biostatistics and Computational Biology
| | - David J Topham
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY 14642, USA
| | - Juilee Thakar
- Department of Biostatistics and Computational Biology.,Department of Microbiology and Immunology, University of Rochester, Rochester, NY 14642, USA
| | - Xing Qiu
- Department of Biostatistics and Computational Biology
| |
Collapse
|
18
|
POST: A framework for set-based association analysis in high-dimensional data. Methods 2018; 145:76-81. [PMID: 29777750 DOI: 10.1016/j.ymeth.2018.05.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 05/11/2018] [Accepted: 05/13/2018] [Indexed: 01/08/2023] Open
Abstract
Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.
Collapse
|
19
|
Alaimo S, Giugno R, Acunzo M, Veneziano D, Ferro A, Pulvirenti A. Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification. Oncotarget 2018; 7:54572-54582. [PMID: 27275538 PMCID: PMC5342365 DOI: 10.18632/oncotarget.9788] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 05/11/2016] [Indexed: 01/27/2023] Open
Abstract
Motivation Prediction of phenotypes from high-dimensional data is a crucial task in precision biology and medicine. Many technologies employ genomic biomarkers to characterize phenotypes. However, such elements are not sufficient to explain the underlying biology. To improve this, pathway analysis techniques have been proposed. Nevertheless, such methods have shown lack of accuracy in phenotypes classification. Results Here we propose a novel methodology called MITHrIL (Mirna enrIched paTHway Impact anaLysis) for the analysis of signaling pathways, which extends the work of Tarca et al., 2009. MITHrIL augments pathways with missing regulatory elements, such as microRNAs, and their interactions with genes. The method takes as input the expression values of genes and/or microRNAs and returns a list of pathways sorted according to their degree of deregulation, together with the corresponding statistical significance (p-values). Our analysis shows that MITHrIL outperforms its competitors even in the worst case. In addition, our method is able to correctly classify sets of tumor samples drawn from TCGA. Availability MITHrIL is freely available at the following URL: http://alpha.dmi.unict.it/mithril/
Collapse
Affiliation(s)
- Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
| | - Mario Acunzo
- Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Dario Veneziano
- Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| |
Collapse
|
20
|
Statistical Approach for Gene Set Analysis with Trait Specific Quantitative Trait Loci. Sci Rep 2018; 8:2391. [PMID: 29402907 PMCID: PMC5799309 DOI: 10.1038/s41598-018-19736-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 12/06/2017] [Indexed: 11/20/2022] Open
Abstract
The analysis of gene sets is usually carried out based on gene ontology terms and known biological pathways. These approaches may not establish any formal relation between genotype and trait specific phenotype. In plant biology and breeding, analysis of gene sets with trait specific Quantitative Trait Loci (QTL) data are considered as great source for biological knowledge discovery. Therefore, we proposed an innovative statistical approach called Gene Set Analysis with QTLs (GSAQ) for interpreting gene expression data in context of gene sets with traits. The utility of GSAQ was studied on five different complex abiotic and biotic stress scenarios in rice, which yields specific trait/stress enriched gene sets. Further, the GSAQ approach was more innovative and effective in performing gene set analysis with underlying QTLs and identifying QTL candidate genes than the existing approach. The GSAQ approach also provided two potential biological relevant criteria for performance analysis of gene selection methods. Based on this proposed approach, an R package, i.e., GSAQ (https://cran.r-project.org/web/packages/GSAQ) has been developed. The GSAQ approach provides a valuable platform for integrating the gene expression data with genetically rich QTL data.
Collapse
|
21
|
Wei W, Sun Z, da Silveira WA, Yu Z, Lawson A, Hardiman G, Kelemen LE, Chung D. Semi-supervised identification of cancer subgroups using survival outcomes and overlapping grouping information. Stat Methods Med Res 2018; 28:2137-2149. [PMID: 29336210 DOI: 10.1177/0962280217752980] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because of low reproducibility and instability of identified cancer subgroups and molecular features. In order to address this challenge, we developed Integrative Genomics Robust iDentification of cancer subgroups (InGRiD), a statistical approach that integrates information from biological pathway databases with high-throughput genomic data to improve the robustness for identification and interpretation of molecularly-defined subgroups of cancer patients. We applied InGRiD to the gene expression data of high-grade serous ovarian cancer from The Cancer Genome Atlas and the Australian Ovarian Cancer Study. The results indicate clear benefits of the pathway-level approaches over the gene-level approaches. In addition, using the proposed InGRiD framework, we also investigate and address the issue of gene sharing among pathways, which often occurs in practice, to further facilitate biological interpretation of key molecular features associated with cancer progression. The R package "InGRiD" implementing the proposed approach is currently available in our research group GitHub webpage ( https://dongjunchung.github.io/INGRID/ ).
Collapse
Affiliation(s)
- Wei Wei
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA.,2 Department of Biostatistics, Yale University, New Haven, USA
| | - Zequn Sun
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Willian A da Silveira
- 3 Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, USA.,4 Center for Genomic Medicine, Medical University of South Carolina, Charleston, USA
| | - Zhenning Yu
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Andrew Lawson
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Gary Hardiman
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA.,4 Center for Genomic Medicine, Medical University of South Carolina, Charleston, USA.,5 Department of Medicine, Medical University of South Carolina, Charleston, USA
| | - Linda E Kelemen
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Dongjun Chung
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| |
Collapse
|
22
|
Brightbill HD, Suto E, Blaquiere N, Ramamoorthi N, Sujatha-Bhaskar S, Gogol EB, Castanedo GM, Jackson BT, Kwon YC, Haller S, Lesch J, Bents K, Everett C, Kohli PB, Linge S, Christian L, Barrett K, Jaochico A, Berezhkovskiy LM, Fan PW, Modrusan Z, Veliz K, Townsend MJ, DeVoss J, Johnson AR, Godemann R, Lee WP, Austin CD, McKenzie BS, Hackney JA, Crawford JJ, Staben ST, Alaoui Ismaili MH, Wu LC, Ghilardi N. NF-κB inducing kinase is a therapeutic target for systemic lupus erythematosus. Nat Commun 2018; 9:179. [PMID: 29330524 PMCID: PMC5766581 DOI: 10.1038/s41467-017-02672-0] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 12/18/2017] [Indexed: 02/06/2023] Open
Abstract
NF-κB-inducing kinase (NIK) mediates non-canonical NF-κB signaling downstream of multiple TNF family members, including BAFF, TWEAK, CD40, and OX40, which are implicated in the pathogenesis of systemic lupus erythematosus (SLE). Here, we show that experimental lupus in NZB/W F1 mice can be treated with a highly selective and potent NIK small molecule inhibitor. Both in vitro as well as in vivo, NIK inhibition recapitulates the pharmacological effects of BAFF blockade, which is clinically efficacious in SLE. Furthermore, NIK inhibition also affects T cell parameters in the spleen and proinflammatory gene expression in the kidney, which may be attributable to inhibition of OX40 and TWEAK signaling, respectively. As a consequence, NIK inhibition results in improved survival, reduced renal pathology, and lower proteinuria scores. Collectively, our data suggest that NIK inhibition is a potential therapeutic approach for SLE.
Collapse
Affiliation(s)
- Hans D Brightbill
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Eric Suto
- Department of Translational Immunology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Nicole Blaquiere
- Department of Discovery Chemistry, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Nandhini Ramamoorthi
- Department of Biomarker Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Swathi Sujatha-Bhaskar
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Emily B Gogol
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Georgette M Castanedo
- Department of Discovery Chemistry, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Benjamin T Jackson
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Youngsu C Kwon
- Department of Translational Immunology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Susan Haller
- Department of Pathology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Justin Lesch
- Department of Translational Immunology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Karin Bents
- Evotec, Inc., Essener Bogen 7, Hamburg, 22419, Germany
| | - Christine Everett
- Department of Biochemical and Cellular Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Pawan Bir Kohli
- Department of Biochemical and Cellular Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Sandra Linge
- Evotec, Inc., Essener Bogen 7, Hamburg, 22419, Germany
| | - Laura Christian
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Kathy Barrett
- Department of Biochemical and Cellular Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Allan Jaochico
- Department of Drug Metabolism and Pharmacokinetics, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Leonid M Berezhkovskiy
- Department of Drug Metabolism and Pharmacokinetics, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Peter W Fan
- Department of Drug Metabolism and Pharmacokinetics, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Zora Modrusan
- Department of Molecular Biology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Kelli Veliz
- Department of Laboratory Animal Resources, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Michael J Townsend
- Department of Biomarker Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Jason DeVoss
- Department of Translational Immunology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Adam R Johnson
- Department of Biochemical and Cellular Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | | | - Wyne P Lee
- Department of Translational Immunology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Cary D Austin
- Department of Pathology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Brent S McKenzie
- Department of Translational Immunology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Jason A Hackney
- Department of Bioinformatics and Computational Biology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - James J Crawford
- Department of Discovery Chemistry, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Steven T Staben
- Department of Discovery Chemistry, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Moulay H Alaoui Ismaili
- Department of Biochemical and Cellular Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Lawren C Wu
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA
| | - Nico Ghilardi
- Department of Immunology Discovery, Genentech, 1 DNA Way, South San Francisco, CA-94080, USA.
| |
Collapse
|
23
|
Deconvolution of Transcriptional Networks in Post-Traumatic Stress Disorder Uncovers Master Regulators Driving Innate Immune System Function. Sci Rep 2017; 7:14486. [PMID: 29101382 PMCID: PMC5670244 DOI: 10.1038/s41598-017-15221-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/23/2017] [Indexed: 01/05/2023] Open
Abstract
Post-Traumatic Stress Disorder (PTSD) is a psychiatric disorder that develops in individuals experiencing a shocking incident, but the underlying disease susceptibility gene networks remain poorly understood. Breen et al. conducted a Weighted Gene Co-expression Network Analysis on PTSD, and identified a dysregulated innate immune module associated with PTSD development. To further identify the Master Regulators (MRs) driving the network function, here we deconvoluted the transcriptional networks on the same datasets using ARACNe (Algorithm for Reconstruction of Accurate Cellular Networks) followed by protein activity analysis. We successfully identified several MRs including SOX3, TNFAIP3, TRAFD1, POU3F3, STAT2, and PML that govern the expression of a large collection of genes. Transcription factor binding site enrichment analysis verified the binding of these MRs to their predicted targets. Notably, the sub-networks regulated by TNFAIP3, TRAFD1 and PML are involved in innate immune response, suggesting that these MRs may correlate with the innate immune module identified by Breen et al. These findings were replicated in an independent dataset generated on expression microarrays. In conclusion, our analysis corroborated previous findings that innate immunity may be involved in the progression of PTSD, yet also identified candidate MRs driving the disease progression in the innate immunity pathways.
Collapse
|
24
|
Lavallée-Adam M, Cloutier P, Coulombe B, Blanchette M. Functional 5' UTR motif discovery with LESMoN: Local Enrichment of Sequence Motifs in biological Networks. Nucleic Acids Res 2017; 45:10415-10427. [PMID: 28977652 PMCID: PMC5737372 DOI: 10.1093/nar/gkx751] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 08/17/2017] [Indexed: 01/09/2023] Open
Abstract
Biological networks are rich representations of the relationships between entities such as genes or proteins and have become increasingly complete thanks to various high-throughput network mapping experimental approaches. Here, we propose a method to use such networks to guide the search for functional sequence motifs. Specifically, we introduce Local Enrichment of Sequence Motifs in biological Networks (LESMoN), an enumerative motif discovery algorithm that identifies 5' untranslated region (UTR) sequence motifs whose associated proteins form unexpectedly dense clusters in a given biological network. When applied to the human protein-protein interaction network from BioGRID, LESMoN identifies several highly significant 5' UTR sequence motifs, including both previously known motifs and uncharacterized ones. The vast majority of these motifs are evolutionary conserved and the genes containing them are significantly enriched for various gene ontology terms suggesting new associations between 5' UTR motifs and a number of biological processes. We validate in vivo the role in protein expression regulation of three motifs identified by LESMoN.
Collapse
Affiliation(s)
- Mathieu Lavallée-Adam
- McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, Québec H3A 0E9, Canada.,Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada
| | - Philippe Cloutier
- Translational Proteomics Laboratory, Institut de recherches cliniques de Montréal, Montréal, Québec H2W 1R7, Canada
| | - Benoit Coulombe
- Translational Proteomics Laboratory, Institut de recherches cliniques de Montréal, Montréal, Québec H2W 1R7, Canada.,Département de biochimie et médecine moléculaire, Université de Montréal, Montréal, Québec H3C 3J7, Canada
| | - Mathieu Blanchette
- McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, Québec H3A 0E9, Canada
| |
Collapse
|
25
|
Abstract
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. Pathway analysis is a common approach to quickly access the pathways being regulated in the experiments. There are numerous statistics to perform pathway analysis; most of them assume that the genes or proteins are independent of each other for statistical ease. This assumption, however, is unrealistic to the real biological system and may cause false positives in practice. A standard way to address this issue is to measure the associations among genes or proteins. Unfortunately, the estimation of associations requires sufficient sample size, which is usually not available for proteomic data produced by mass spectrometry. In this study, we propose a T2-statistic, which estimates the associations among gene products, to perform pathway analysis for quantitative proteomic data. Instead of calculating the associations directly from data, we use the confidence scores retrieved from protein-protein interaction databases. We also design an integrating procedure to reserve pathways of sufficient evidence as a regulated pathway group. We compare the proposed T2-statistic to other popular statistics using five published experimental datasets, and the T2-statistic yields more accurate descriptions in agreement with the discussion of the original papers.
Collapse
|
26
|
Alhamdoosh M, Ng M, Wilson NJ, Sheridan JM, Huynh H, Wilson MJ, Ritchie ME. Combining multiple tools outperforms individual methods in gene set enrichment analyses. Bioinformatics 2017; 33:414-424. [PMID: 27694195 PMCID: PMC5408797 DOI: 10.1093/bioinformatics/btw623] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 09/23/2016] [Indexed: 12/22/2022] Open
Abstract
Motivation Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. Results The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA’s gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists’ feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. Availability and Implementation EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/. The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Milica Ng
- CSL Limited, Bio21 Institute, Parkville, Australia
| | | | - Julie M Sheridan
- ACRF Stem Cells and Cancer Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Huy Huynh
- CSL Limited, Bio21 Institute, Parkville, Australia
| | | | - Matthew E Ritchie
- Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
27
|
Hayashi N, Iwamoto T, Qi Y, Niikura N, Santarpia L, Yamauchi H, Nakamura S, Hortobagyi GN, Pusztai L, Symmans WF, Ueno NT. Bone metastasis-related signaling pathways in breast cancers stratified by estrogen receptor status. J Cancer 2017; 8:1045-1052. [PMID: 28529618 PMCID: PMC5436258 DOI: 10.7150/jca.13690] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Background: Breast cancer bone metastasis (BCBM)-specific genes have been reported without considering biological differences based on estrogen receptor (ER) status. The aims of this study were to identify BCBM-specific genes using our patient dataset and validate previously reported BCBM-specific genes, and to determine whether ER-status-related biological differences matter in identification of BCBM-specific genes. Methods: We used Affymetrix GeneChips to analyze 365 primary human epidermal growth factor receptor 2 (HER2)-negative invasive breast cancer specimens. Genes that were differentially expressed between patients who developed bone metastasis and those who developed non-bone metastasis were identified using Cox proportional hazards model, and differential expression of gene sets was assessed using gene set analysis. We performed gene set analysis to determine whether biological function associated with bone metastasis were different by ER status using 2,246 functionally annotated gene sets assembled from Gene Ontology data base. Results: Among 16,712 probe sets, 592 were overexpressed in the bone metastasis cohort compared to the non-bone-metastasis cohort (false discovery rate ≤ 0.05). However, no BCBM-specific genes met our significance tests when the cancers were stratified by ER status. In ER-positive and ER-negative breast cancers, 151 and 125 gene sets, respectively, were overexpressed for BCBM and the majority of BCBM-related pathways were different. Of significant gene sets, only 13 gene sets were overlapped between ER-positive and -negative cohorts. Conclusion: ER-positive and ER-negative breast cancers have different biological pathways in BCBM development. We have yet to explore BCBM-related biomarkers and targets considering the biological features associated with BCBM depending on the ER status.
Collapse
Affiliation(s)
- Naoki Hayashi
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Takayuki Iwamoto
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yuan Qi
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Naoki Niikura
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Libero Santarpia
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hideko Yamauchi
- Department of Breast Surgical Oncology, St. Luke's International Hospital, Tokyo, Japan
| | - Seigo Nakamura
- Department of Surgery, Division of Breast Surgical Oncology, Showa University School of Medicine, Tokyo, Japan
| | - Gabriel N Hortobagyi
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Lajos Pusztai
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - W Fraser Symmans
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Naoto T Ueno
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
28
|
Katewa A, Wang Y, Hackney JA, Huang T, Suto E, Ramamoorthi N, Austin CD, Bremer M, Chen JZ, Crawford JJ, Currie KS, Blomgren P, DeVoss J, DiPaolo JA, Hau J, Johnson A, Lesch J, DeForge LE, Lin Z, Liimatta M, Lubach JW, McVay S, Modrusan Z, Nguyen A, Poon C, Wang J, Liu L, Lee WP, Wong H, Young WB, Townsend MJ, Reif K. Btk-specific inhibition blocks pathogenic plasma cell signatures and myeloid cell-associated damage in IFN α-driven lupus nephritis. JCI Insight 2017; 2:e90111. [PMID: 28405610 DOI: 10.1172/jci.insight.90111] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Systemic lupus erythematosus (SLE) is often associated with exaggerated B cell activation promoting plasma cell generation, immune-complex deposition in the kidney, renal infiltration of myeloid cells, and glomerular nephritis. Type-I IFNs amplify these autoimmune processes and promote severe disease. Bruton's tyrosine kinase (Btk) inhibitors are considered novel therapies for SLE. We describe the characterization of a highly selective reversible Btk inhibitor, G-744. G-744 is efficacious, and superior to blocking BAFF and Syk, in ameliorating severe lupus nephritis in both spontaneous and IFNα-accelerated lupus in NZB/W_F1 mice in therapeutic regimens. Selective Btk inhibition ablated plasmablast generation, reduced autoantibodies, and - similar to cyclophosphamide - improved renal pathology in IFNα-accelerated lupus. Employing global transcriptional profiling of spleen and kidney coupled with cross-species human modular repertoire analyses, we identify similarities in the inflammatory process between mice and humans, and we demonstrate that G-744 reduced gene expression signatures essential for splenic B cell terminal differentiation, particularly the secretory pathway, as well as renal transcriptional profiles coupled with myeloid cell-mediated pathology and glomerular plus tubulointerstitial disease in human glomerulonephritis patients. These findings reveal the mechanism through which a selective Btk inhibitor blocks murine autoimmune kidney disease, highlighting pathway activity that may translate to human SLE.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - James J Crawford
- Discovery Chemistry, at Genentech, South San Francisco, California, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Lichuan Liu
- Clinical Pharmacology at Genentech, South San Francisco, California, USA
| | | | | | - Wendy B Young
- Discovery Chemistry, at Genentech, South San Francisco, California, USA
| | | | | |
Collapse
|
29
|
Simillion C, Liechti R, Lischer HEL, Ioannidis V, Bruggmann R. Avoiding the pitfalls of gene set enrichment analysis with SetRank. BMC Bioinformatics 2017; 18:151. [PMID: 28259142 PMCID: PMC5336655 DOI: 10.1186/s12859-017-1571-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 02/24/2017] [Indexed: 02/06/2023] Open
Abstract
Background The purpose of gene set enrichment analysis (GSEA) is to find general trends in the huge lists of genes or proteins generated by many functional genomics techniques and bioinformatics analyses. Results Here we present SetRank, an advanced GSEA algorithm which is able to eliminate many false positive hits. The key principle of the algorithm is that it discards gene sets that have initially been flagged as significant, if their significance is only due to the overlap with another gene set. The algorithm is explained in detail and its performance is compared to that of other methods using objective benchmarking criteria. Furthermore, we explore how sample source bias can affect the results of a GSEA analysis. Conclusions The benchmarking results show that SetRank is a highly specific tool for GSEA. Furthermore, we show that the reliability of results can be improved by taking sample source bias into account. SetRank is available as an R package and through an online web interface.
Collapse
Affiliation(s)
- Cedric Simillion
- Interfaculty Bioinformatics Unit and SIB Swiss Institute of Bioinformatics, University of Bern, Baltzerstrasse 6, 3012, Berne, Switzerland. .,Department of Clinical Research, University of Bern, Murtenstrasse 35, 3008, Berne, Switzerland.
| | - Robin Liechti
- Vital-IT, SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Genopode, 1015, Lausanne, Switzerland
| | - Heidi E L Lischer
- Interfaculty Bioinformatics Unit and SIB Swiss Institute of Bioinformatics, University of Bern, Baltzerstrasse 6, 3012, Berne, Switzerland.,Present Address: URPP Evolution in Action; Institute of Evolutionary Biology and Environmental Studies (IEU), University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Vassilios Ioannidis
- Vital-IT, SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Genopode, 1015, Lausanne, Switzerland.,SIB Technology, SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Genopode, 1015, Lausanne, Switzerland
| | - Rémy Bruggmann
- Interfaculty Bioinformatics Unit and SIB Swiss Institute of Bioinformatics, University of Bern, Baltzerstrasse 6, 3012, Berne, Switzerland.
| |
Collapse
|
30
|
Ren X, Hu Q, Liu S, Wang J, Miecznikowski JC. Gene set analysis controlling for length bias in RNA-seq experiments. BioData Min 2017; 10:5. [PMID: 28184252 PMCID: PMC5294840 DOI: 10.1186/s13040-017-0125-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 01/11/2017] [Indexed: 01/29/2023] Open
Abstract
Background In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis. Results We develop SeqGSA, a new method for gene set analysis with length bias adjustment for RNA-seq data. It extends from the R package GSA designed for microarrays. Our method compares the gene set maxmean statistic against permutations, while also taking into account of the statistics of the other gene sets. To adjust for the gene length bias, we implement a flexible weighted sampling scheme in the restandardization step of our algorithm. We show our method improves the power of identifying significant gene sets that are affected by the length bias. We also show that our method maintains the type I error comparing with another representative method for gene set enrichment test. Conclusions SeqGSA is a promising tool for testing significant gene pathways with RNA-seq data while adjusting for inherent gene length effect. It enhances the power to detect gene sets affected by the bias and maintains type I error under various situations.
Collapse
Affiliation(s)
- Xing Ren
- Department of Biostatistics, SUNY University at Buffalo, Buffalo, 14214 USA
| | - Qiang Hu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, 14263 USA
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, 14263 USA
| | - Jianmin Wang
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, 14263 USA
| | | |
Collapse
|
31
|
PerSubs: A Graph-Based Algorithm for the Identification of Perturbed Subpathways Caused by Complex Diseases. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 988:215-224. [DOI: 10.1007/978-3-319-56246-9_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
32
|
Haddick PCG, Larson JL, Rathore N, Bhangale TR, Phung QT, Srinivasan K, Hansen DV, Lill JR, Pericak-Vance MA, Haines J, Farrer LA, Kauwe JS, Schellenberg GD, Cruchaga C, Goate AM, Behrens TW, Watts RJ, Graham RR, Kaminker JS, van der Brug M. A Common Variant of IL-6R is Associated with Elevated IL-6 Pathway Activity in Alzheimer's Disease Brains. J Alzheimers Dis 2017; 56:1037-1054. [PMID: 28106546 PMCID: PMC5667357 DOI: 10.3233/jad-160524] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The common p.D358A variant (rs2228145) in IL-6R is associated with risk for multiple diseases and with increased levels of soluble IL-6R in the periphery and central nervous system (CNS). Here, we show that the p.D358A allele leads to increased proteolysis of membrane bound IL-6R and demonstrate that IL-6R peptides with A358 are more susceptible to cleavage by ADAM10 and ADAM17. IL-6 responsive genes were identified in primary astrocytes and microglia and an IL-6 gene signature was increased in the CNS of late onset Alzheimer's disease subjects in an IL6R allele dependent manner. We conducted a screen to identify variants associated with the age of onset of Alzheimer's disease in APOE ɛ4 carriers. Across five datasets, p.D358A had a meta P = 3 ×10-4 and an odds ratio = 1.3, 95% confidence interval 1.12 -1.48. Our study suggests that a common coding region variant of the IL-6 receptor results in neuroinflammatory changes that may influence the age of onset of Alzheimer's disease in APOE ɛ4 carriers.
Collapse
Affiliation(s)
- Patrick C G Haddick
- Department of Diagnostic Discovery, Genentech Inc., South San Francisco, CA, USA
| | - Jessica L Larson
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, USA
| | - Nisha Rathore
- Department of Human Genetics, Genentech Inc., South San Francisco, CA, USA
| | - Tushar R Bhangale
- Department of Human Genetics, Genentech Inc., South San Francisco, CA, USA
| | - Qui T Phung
- Department of Protein Chemistry, Genentech Inc., South San Francisco, CA, USA
| | | | - David V Hansen
- Department of Neuroscience, Genentech Inc., South San Francisco, CA, USA
| | - Jennie R Lill
- Department of Protein Chemistry, Genentech Inc., South San Francisco, CA, USA
| | - Margaret A Pericak-Vance
- The John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
- Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami, Miami, FL, USA
| | - Jonathan Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Lindsay A Farrer
- Department of Medicine (Biomedical Genetics), Boston University Schools of Medicine and Public Health, Boston, MA, USA
- Department of Neurology, Boston University Schools of Medicine and Public Health, Boston, MA, USA
- Department of Ophthalmology, Boston University Schools of Medicine and Public Health, Boston, MA, USA
- Department of Epidemiology, Boston University Schools of Medicine and Public Health, Boston, MA, USA
- Department of Biostatistics, Boston University Schools of Medicine and Public Health, Boston, MA, USA
| | - John S Kauwe
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, USA
| | - Alison M Goate
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | - Timothy W Behrens
- Department of Human Genetics, Genentech Inc., South San Francisco, CA, USA
| | - Ryan J Watts
- Department of Neuroscience, Genentech Inc., South San Francisco, CA, USA
| | - Robert R Graham
- Department of Human Genetics, Genentech Inc., South San Francisco, CA, USA
| | - Joshua S Kaminker
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, USA
| | - Marcel van der Brug
- Department of Diagnostic Discovery, Genentech Inc., South San Francisco, CA, USA
| |
Collapse
|
33
|
Lee J, Jo K, Lee S, Kang J, Kim S. Prioritizing biological pathways by recognizing context in time-series gene expression data. BMC Bioinformatics 2016; 17:477. [PMID: 28155707 PMCID: PMC5259824 DOI: 10.1186/s12859-016-1335-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information. Results In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets. Conclusions Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRAP. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1335-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jusang Lee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Kyuri Jo
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sunwon Lee
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
34
|
Du J, Li M, Yuan Z, Guo M, Song J, Xie X, Chen Y. A decision analysis model for KEGG pathway analysis. BMC Bioinformatics 2016; 17:407. [PMID: 27716040 PMCID: PMC5053338 DOI: 10.1186/s12859-016-1285-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2015] [Accepted: 09/28/2016] [Indexed: 11/18/2022] Open
Abstract
Background The knowledge base-driven pathway analysis is becoming the first choice for many investigators, in that it not only can reduce the complexity of functional analysis by grouping thousands of genes into just several hundred pathways, but also can increase the explanatory power for the experiment by identifying active pathways in different conditions. However, current approaches are designed to analyze a biological system assuming that each pathway is independent of the other pathways. Results A decision analysis model is developed in this article that accounts for dependence among pathways in time-course experiments and multiple treatments experiments. This model introduces a decision coefficient—a designed index, to identify the most relevant pathways in a given experiment by taking into account not only the direct determination factor of each Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway itself, but also the indirect determination factors from its related pathways. Meanwhile, the direct and indirect determination factors of each pathway are employed to demonstrate the regulation mechanisms among KEGG pathways, and the sign of decision coefficient can be used to preliminarily estimate the impact direction of each KEGG pathway. The simulation study of decision analysis demonstrated the application of decision analysis model for KEGG pathway analysis. Conclusions A microarray dataset from bovine mammary tissue over entire lactation cycle was used to further illustrate our strategy. The results showed that the decision analysis model can provide the promising and more biologically meaningful results. Therefore, the decision analysis model is an initial attempt of optimizing pathway analysis methodology. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1285-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junli Du
- College of sciences, Northwest A&F University, Yangling, 712100, People's Republic of China.,College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, People's Republic of China
| | - Manlin Li
- College of sciences, Northwest A&F University, Yangling, 712100, People's Republic of China
| | - Zhifa Yuan
- College of sciences, Northwest A&F University, Yangling, 712100, People's Republic of China
| | - Mancai Guo
- College of sciences, Northwest A&F University, Yangling, 712100, People's Republic of China
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA
| | - Xiaozhen Xie
- College of sciences, Northwest A&F University, Yangling, 712100, People's Republic of China
| | - Yulin Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, People's Republic of China.
| |
Collapse
|
35
|
Sugimoto M. Metabolomic pathway visualization tool outsourcing editing function. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:7659-62. [PMID: 26738066 DOI: 10.1109/embc.2015.7320166] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Recent rapid improvements of measuring instrument enables us to perform various omics studies to simultaneous profile multiple molecules, which provides a holistic view of various molecular interactions, such as signal transaction, protein interactions, and metabolic pathways. Metabolomics is recently emerged omics that can identify and quantify low weight metabolites usually defined as organic molecules whose size is <; 1500 Da. In comparison to the other omics, the development of software tools to deal with metabolomic data is not matured. Conventional pathway drawing and visualization tool provide tool-specific unique functions, however, such user interface requires users to learn the usage and prevention for the use of these tools. Here, we developed a more generic pathway visualization tool. This tool incorporate pathway data yielded by common drawing tools, e.g. MS PowerPoint, and visualize the quantified values on the pathways. The statistical results also can be overlaid on each metabolite. The developed tools facilitate the interpreting metabolomic data in pathway forms.
Collapse
|
36
|
Lee S, Choi S, Kim YJ, Kim BJ, Hwang H, Park T. Pathway-based approach using hierarchical components of collapsed rare variants. Bioinformatics 2016; 32:i586-i594. [PMID: 27587678 PMCID: PMC5013912 DOI: 10.1093/bioinformatics/btw425] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION To address 'missing heritability' issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem. RESULTS Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study. AVAILABILITY AND IMPLEMENTATION An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/ CONTACT tspark@stats.snu.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sungyoung Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea
| | - Sungkyoung Choi
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea
| | - Young Jin Kim
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Chungcheongbuk-Do 363-951, Korea
| | - Bong-Jo Kim
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Chungcheongbuk-Do 363-951, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea Department of Statistics, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
37
|
Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 2016; 48:838-47. [PMID: 27322546 PMCID: PMC5040167 DOI: 10.1038/ng.3593] [Citation(s) in RCA: 493] [Impact Index Per Article: 61.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 05/23/2016] [Indexed: 01/05/2023]
Abstract
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible. To address this problem we introduce and experimentally validate a new algorithm, virtual inference of protein activity by enriched regulon analysis (VIPER), for accurate assessment of protein activity from gene expression data. We used VIPER to evaluate the functional relevance of genetic alterations in regulatory proteins across all samples in The Cancer Genome Atlas (TCGA). In addition to accurately infer aberrant protein activity induced by established mutations, we also identified a fraction of tumors with aberrant activity of druggable oncoproteins despite a lack of mutations, and vice versa. In vitro assays confirmed that VIPER-inferred protein activity outperformed mutational analysis in predicting sensitivity to targeted inhibitors.
Collapse
Affiliation(s)
- Mariano J. Alvarez
- Department of Systems Biology, Columbia University, New York, USA
- DarwinHealth Inc., New York, USA
| | - Yao Shen
- Department of Systems Biology, Columbia University, New York, USA
- DarwinHealth Inc., New York, USA
| | | | | | - B. Belinda Ding
- Department of Cell Biology, Albert Einstein College of Medicine, New York, USA
| | - B. Hilda Ye
- Department of Cell Biology, Albert Einstein College of Medicine, New York, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, USA
- Department of Biomedical Informatics, Columbia University, New York, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University, New York, USA
- Institute for Cancer Genetics, Columbia University, New York, USA
- Motor Neuron Center, Columbia University, New York, USA
- Columbia Initiative in Stem Cells, Columbia University, New York, USA
| |
Collapse
|
38
|
Pham LM, Carvalho L, Schaus S, Kolaczyk ED. Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach. J Am Stat Assoc 2016; 111:73-92. [PMID: 27647944 DOI: 10.1080/01621459.2015.1110523] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations. We then identify perturbations through posterior-based variable selection. We illustrate our approach using gene transcription drug perturbation profiles from the DREAM7 drug sensitivity predication challenge data set. Our proposed method identified regulatory pathways that are known to play a causative role and that were not readily resolved using gene set enrichment analysis or exploratory factor models. Simulation results are presented assessing the performance of this model relative to a network-free variant and its robustness to inaccuracies in biological databases.
Collapse
|
39
|
García-Marqués F, Trevisan-Herraz M, Martínez-Martínez S, Camafeita E, Jorge I, Lopez JA, Méndez-Barbero N, Méndez-Ferrer S, Del Pozo MA, Ibáñez B, Andrés V, Sánchez-Madrid F, Redondo JM, Bonzon-Kulichenko E, Vázquez J. A Novel Systems-Biology Algorithm for the Analysis of Coordinated Protein Responses Using Quantitative Proteomics. Mol Cell Proteomics 2016; 15:1740-60. [PMID: 26893027 DOI: 10.1074/mcp.m115.055905] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Indexed: 11/06/2022] Open
Abstract
The coordinated behavior of proteins is central to systems biology. However, the underlying mechanisms are poorly known and methods to analyze coordination by conventional quantitative proteomics are still lacking. We present the Systems Biology Triangle (SBT), a new algorithm that allows the study of protein coordination by pairwise quantitative proteomics. The Systems Biology Triangle detected statistically significant coordination in diverse biological models of very different nature and subjected to different kinds of perturbations. The Systems Biology Triangle also revealed with unprecedented molecular detail an array of coordinated, early protein responses in vascular smooth muscle cells treated at different times with angiotensin-II. These responses included activation of protein synthesis, folding, turnover, and muscle contraction - consistent with a differentiated phenotype-as well as the induction of migration and the repression of cell proliferation and secretion. Remarkably, the majority of the altered functional categories were protein complexes, interaction networks, or metabolic pathways. These changes could not be detected by other algorithms widely used by the proteomics community, and the vast majority of proteins involved have not been described before to be regulated by AngII. The unique capabilities of The Systems Biology Triangle to detect functional protein alterations produced by the coordinated action of proteins in pairwise quantitative proteomics experiments make this algorithm an attractive choice for the biological interpretation of results on a routine basis.
Collapse
Affiliation(s)
- Fernando García-Marqués
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Marco Trevisan-Herraz
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Sara Martínez-Martínez
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Emilio Camafeita
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Inmaculada Jorge
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Juan Antonio Lopez
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Nerea Méndez-Barbero
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Simón Méndez-Ferrer
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Miguel Angel Del Pozo
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Borja Ibáñez
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Vicente Andrés
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | | | - Juan Miguel Redondo
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Elena Bonzon-Kulichenko
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Jesús Vázquez
- From the ‡Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| |
Collapse
|
40
|
Hsueh HM, Tsai CA. Gene set analysis using sufficient dimension reduction. BMC Bioinformatics 2016; 17:74. [PMID: 26852017 PMCID: PMC4744442 DOI: 10.1186/s12859-016-0928-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 02/01/2016] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. RESULTS Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. CONCLUSIONS We concluded that the SDR methods outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA methods for detecting enriched gene sets.
Collapse
Affiliation(s)
- Huey-Miin Hsueh
- Department of Statistics, National Chengchi UniversityZhinan Road, Taipei116, Taiwan, Taipei, 116, Taiwan.
| | - Chen-An Tsai
- Department of Agronomy, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei, 106, Taiwan.
| |
Collapse
|
41
|
Liu Z, Roy NC, Guo Y, Jia H, Ryan L, Samuelsson L, Thomas A, Plowman J, Clerens S, Day L, Young W. Human Breast Milk and Infant Formulas Differentially Modify the Intestinal Microbiota in Human Infants and Host Physiology in Rats. J Nutr 2016; 146:191-9. [PMID: 26674765 DOI: 10.3945/jn.115.223552] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 11/11/2015] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND In the absence of human breast milk, infant and follow-on formulas can still promote efficient growth and development. However, infant formulas can differ in their nutritional value. OBJECTIVE The objective of this study was to compare the effects of human milk (HM) and infant formulas in human infants and a weanling rat model. METHODS In a 3 wk clinical randomized controlled trial, babies (7- to 90-d-old, male-to-female ratio 1:1) were exclusively breastfed (BF), exclusively fed Synlait Pure Canterbury Stage 1 infant formula (SPCF), or fed assorted standard formulas (SFs) purchased by their parents. We also compared feeding HM or SPCF in weanling male Sprague-Dawley rats for 28 d. We examined the effects of HM and infant formulas on fecal short chain fatty acids (SCFAs) and bacterial composition in human infants, and intestinal SCFAs, the microbiota, and host physiology in weanling rats. RESULTS Fecal Bifidobacterium concentrations (mean log copy number ± SEM) were higher (P = 0.003) in BF (8.17 ± 0.3) and SPCF-fed infants (8.29 ± 0.3) compared with those fed the SFs (6.94 ± 0.3). Fecal acetic acid (mean ± SEM) was also higher (P = 0.007) in the BF (5.5 ± 0.2 mg/g) and SPCF (5.3 ± 2.4 mg/g) groups compared with SF-fed babies (4.3 ± 0.2 mg/g). Colonic SCFAs did not differ between HM- and SPCF-fed rats. However, cecal acetic acid concentrations were higher (P = 0.001) in rats fed HM (42.6 ± 2.6 mg/g) than in those fed SPCF (30.6 ± 0.8 mg/g). Cecal transcriptome, proteome, and plasma metabolite analyses indicated that the growth and maturation of intestinal tissue was more highly promoted by HM than SPCF. CONCLUSIONS Fecal bacterial composition and SCFA concentrations were similar in babies fed SPCF or HM. However, results from the rat study showed substantial differences in host physiology between rats fed HM and SPCF. This trial was registered at Shanghai Jiào tong University School of Medicine as XHEC-C-2012-024.
Collapse
Affiliation(s)
- Zhenmin Liu
- State Key Laboratory of Dairy Biotechnology, Dairy Research Institute, Bright Dairy and Food Co. Ltd., Shanghai, China
| | - Nicole C Roy
- Food Nutrition and Health Team, Food and Bio-Based Products Group, AgResearch Ltd., Palmerston North, New Zealand; Riddet Institute, Massey University, Palmerston North, New Zealand
| | - Yanhong Guo
- State Key Laboratory of Dairy Biotechnology, Dairy Research Institute, Bright Dairy and Food Co. Ltd., Shanghai, China
| | - Hongxin Jia
- State Key Laboratory of Dairy Biotechnology, Dairy Research Institute, Bright Dairy and Food Co. Ltd., Shanghai, China
| | - Leigh Ryan
- Food Nutrition and Health Team, Food and Bio-Based Products Group, AgResearch Ltd., Palmerston North, New Zealand
| | - Linda Samuelsson
- Food Nutrition and Health Team, Food and Bio-Based Products Group, AgResearch Ltd., Palmerston North, New Zealand
| | - Ancy Thomas
- Proteins and Biomaterials Team, Food and Bio-Based Products Group, AgResearch Ltd., Christchurch, New Zealand; and
| | - Jeff Plowman
- Proteins and Biomaterials Team, Food and Bio-Based Products Group, AgResearch Ltd., Christchurch, New Zealand; and
| | - Stefan Clerens
- Proteins and Biomaterials Team, Food and Bio-Based Products Group, AgResearch Ltd., Christchurch, New Zealand; and Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | - Li Day
- Food Nutrition and Health Team, Food and Bio-Based Products Group, AgResearch Ltd., Palmerston North, New Zealand
| | - Wayne Young
- Food Nutrition and Health Team, Food and Bio-Based Products Group, AgResearch Ltd., Palmerston North, New Zealand;
| |
Collapse
|
42
|
Tew GW, Hackney JA, Gibbons D, Lamb CA, Luca D, Egen JG, Diehl L, Eastham Anderson J, Vermeire S, Mansfield JC, Feagan BG, Panes J, Baumgart DC, Schreiber S, Dotan I, Sandborn WJ, Kirby JA, Irving PM, De Hertogh G, Van Assche GA, Rutgeerts P, O'Byrne S, Hayday A, Keir ME. Association Between Response to Etrolizumab and Expression of Integrin αE and Granzyme A in Colon Biopsies of Patients With Ulcerative Colitis. Gastroenterology 2016; 150:477-87.e9. [PMID: 26522261 DOI: 10.1053/j.gastro.2015.10.041] [Citation(s) in RCA: 107] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Revised: 10/05/2015] [Accepted: 10/22/2015] [Indexed: 12/13/2022]
Abstract
BACKGROUND & AIMS Etrolizumab is a humanized monoclonal antibody against the β7 integrin subunit that has shown efficacy vs placebo in patients with moderate to severely active ulcerative colitis (UC). Patients with colon tissues that expressed high levels of the integrin αE gene (ITGAE) appeared to have the best response. We compared differences in colonic expression of ITGAE and other genes between patients who achieved clinical remission with etrolizumab vs those who did. METHODS We performed a retrospective analysis of data collected from 110 patients with UC who participated in a phase 2 placebo-controlled trial of etrolizumab, as well as from 21 patients with UC or without inflammatory bowel disease (controls) enrolled in an observational study at a separate site. Colon biopsies were collected from patients in both studies and analyzed by immunohistochemistry and gene expression profiling. Mononuclear cells were isolated and analyzed by flow cytometry. We identified biomarkers associated with response to etrolizumab. In the placebo-controlled trial, clinical remission was defined as total Mayo Clinic Score ≤2, with no individual subscore >1, and mucosal healing was defined as endoscopic score ≤1. RESULTS Colon tissues collected at baseline from patients who had a clinical response to etrolizumab expressed higher levels of T-cell-associated genes than patients who did not respond (P < .05). Colonic CD4(+) integrin αE(+) cells from patients with UC expressed higher levels of granzyme A messenger RNA (GZMA mRNA) than CD4(+) αE(-) cells (P < .0001); granzyme A and integrin αE protein were detected in the same cells. Of patients receiving 100 mg etrolizumab, a higher proportion of those with high levels of GZMA mRNA (41%) or ITGAE mRNA (38%) than those with low levels of GZMA (6%) or ITGAE mRNA (13%) achieved clinical remission (P < .05) and mucosal healing (41% GZMA(high) vs 19% GZMA(low) and 44% ITGAE(high) vs 19% ITGAE(low)). Compared with ITGAE(low) and GZMA(low) patients, patients with ITGAE(high) and GZMA(high) had higher baseline numbers of epithelial crypt-associated integrin αE(+) cells (P < .01 for both), but a smaller number of crypt-associated integrin αE(+) cells after etrolizumab treatment (P < .05 for both). After 10 weeks of etrolizumab treatment, expression of genes associated with T-cell activation and genes encoding inflammatory cytokines decreased by 40%-80% from baseline (P < .05) in patients with colon tissues expressing high levels of GZMA at baseline. CONCLUSIONS Levels of GZMA and ITGAE mRNAs in colon tissues can identify patients with UC who are most likely to benefit from etrolizumab; expression levels decrease with etrolizumab administration in biomarker(high) patients. Larger, prospective studies of markers are needed to assess their clinical value.
Collapse
Affiliation(s)
- Gaik W Tew
- Genentech Research and Early Development, South San Francisco, California
| | - Jason A Hackney
- Genentech Research and Early Development, South San Francisco, California
| | | | | | - Diana Luca
- Genentech Research and Early Development, South San Francisco, California
| | - Jackson G Egen
- Genentech Research and Early Development, South San Francisco, California
| | - Lauri Diehl
- Genentech Research and Early Development, South San Francisco, California
| | | | | | | | | | - Julian Panes
- Hospital Clinic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer, Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas, Barcelona, Spain
| | | | - Stefan Schreiber
- Department of Medicine I, University Hospital Schleswig-Holstein, Christian Albrechts University, Kiel, Germany
| | - Iris Dotan
- Inflammatory Bowel Disease Center, Department of Gastroenterology and Liver Diseases, Tel Aviv Medical Center and Sackler Faculty of Medicine, Tel Aviv, Israel
| | | | - John A Kirby
- Newcastle University, Newcastle upon Tyne, United Kingdom
| | | | | | - Gert A Van Assche
- University of Leuven, Leuven, Belgium; University of Toronto, Toronto, Ontario, Canada
| | | | - Sharon O'Byrne
- Genentech Research and Early Development, South San Francisco, California
| | | | - Mary E Keir
- Genentech Research and Early Development, South San Francisco, California.
| |
Collapse
|
43
|
García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E. Pathway Analysis: State of the Art. Front Physiol 2015; 6:383. [PMID: 26733877 PMCID: PMC4681784 DOI: 10.3389/fphys.2015.00383] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 11/26/2015] [Indexed: 12/02/2022] Open
Abstract
Pathway analysis is a set of widely used tools for research in life sciences intended to give meaning to high-throughput biological data. The methodology of these tools settles in the gathering and usage of knowledge that comprise biomolecular functioning, coupled with statistical testing and other algorithms. Despite their wide employment, pathway analysis foundations and overall background may not be fully understood, leading to misinterpretation of analysis results. This review attempts to comprise the fundamental knowledge to take into consideration when using pathway analysis as a hypothesis generation tool. We discuss the key elements that are part of these methodologies, their capabilities and current deficiencies. We also present an overview of current and all-time popular methods, highlighting different classes across them. In doing so, we show the exploding diversity of methods that pathway analysis encompasses, point out commonly overlooked caveats, and direct attention to a potential new class of methods that attempt to zoom the analysis scope to the sample scale.
Collapse
Affiliation(s)
| | - Jesús Espinal-Enríquez
- Computational Genomics, National Institute of Genomic MedicineMéxico City, México; Complejidad en Biología de Sistemas, Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de MéxicoCiudad de México, México
| | - Enrique Hernández-Lemus
- Computational Genomics, National Institute of Genomic MedicineMéxico City, México; Complejidad en Biología de Sistemas, Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de MéxicoCiudad de México, México
| |
Collapse
|
44
|
Hamzić E, Buitenhuis B, Hérault F, Hawken R, Abrahamsen MS, Servin B, Elsen JM, Pinard-van der Laan MH, Bed'Hom B. Genome-wide association study and biological pathway analysis of the Eimeria maxima response in broilers. Genet Sel Evol 2015; 47:91. [PMID: 26607727 PMCID: PMC4659166 DOI: 10.1186/s12711-015-0170-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Accepted: 11/05/2015] [Indexed: 02/22/2023] Open
Abstract
Background Coccidiosis is the most common and costly disease in the poultry industry and is caused by protozoans of the Eimeria genus. The current control of coccidiosis, based on the use of anticoccidial drugs and vaccination, faces serious obstacles such as drug resistance and the high costs for the development of efficient vaccines, respectively. Therefore, the current control programs must be expanded with complementary approaches such as the use of genetics to improve the host response to Eimeria infections. Recently, we have performed a large-scale challenge study on Cobb500 broilers using E. maxima for which we investigated variability among animals in response to the challenge. As a follow-up to this challenge study, we performed a genome-wide association study (GWAS) to identify genomic regions underlying variability of the measured traits in the response to Eimeria maxima in broilers. Furthermore, we conducted a post-GWAS functional analysis to increase our biological understanding of the underlying response to Eimeria maxima challenge. Results In total, we identified 22 single nucleotide polymorphisms (SNPs) with q value <0.1 distributed across five chromosomes. The highly significant SNPs were associated with body weight gain (three SNPs on GGA5, one SNP on GGA1 and one SNP on GGA3), plasma coloration measured as optical density at wavelengths in the range 465–510 nm (10 SNPs and all on GGA10) and the percentage of β2-globulin in blood plasma (15 SNPs on GGA1 and one SNP on GGA2). Biological pathways related to metabolic processes, cell proliferation, and primary innate immune processes were among the most frequent significantly enriched biological pathways. Furthermore, the network-based analysis produced two networks of high confidence, with one centered on large tumor suppressor kinase 1 (LATS1) and 2 (LATS2) and the second involving the myosin heavy chain 6 (MYH6). Conclusions We identified several strong candidate genes and genomic regions associated with traits measured in response to Eimeria maxima in broilers. Furthermore, the post-GWAS functional analysis indicates that biological pathways and networks involved in tissue proliferation and repair along with the primary innate immune response may play the most important role during the early stage of Eimeria maxima infection in broilers. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0170-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Edin Hamzić
- UMR1313 Animal Genetics and Integrative Biology Unit, AgroParisTech, 16 rue Claude Bernard, 75005, Paris, France. .,UMR1313 Animal Genetics and Integrative Biology Unit, INRA, Domaine de Vilvert, 78350, Jouy-en-Josas, France. .,Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Allé 20, P.O. Box 50, 8830, Tjele, Denmark.
| | - Bart Buitenhuis
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Allé 20, P.O. Box 50, 8830, Tjele, Denmark.
| | - Frédéric Hérault
- UMR1348 Physiology, Environment and Genetics for the Animal and Livestock Systems Unit, INRA, Domaine de la Prise, 35590, Saint Gilles, France.
| | | | | | - Bertrand Servin
- UMR1388 Genetics, Physiology and Breeding Systems, INRA, 24 chemin de Borde-Rouge, 31326, Castanet-Tolosan, France.
| | - Jean-Michel Elsen
- UMR1388 Genetics, Physiology and Breeding Systems, INRA, 24 chemin de Borde-Rouge, 31326, Castanet-Tolosan, France.
| | - Marie-Hélène Pinard-van der Laan
- UMR1313 Animal Genetics and Integrative Biology Unit, AgroParisTech, 16 rue Claude Bernard, 75005, Paris, France. .,UMR1313 Animal Genetics and Integrative Biology Unit, INRA, Domaine de Vilvert, 78350, Jouy-en-Josas, France.
| | - Bertrand Bed'Hom
- UMR1313 Animal Genetics and Integrative Biology Unit, AgroParisTech, 16 rue Claude Bernard, 75005, Paris, France. .,UMR1313 Animal Genetics and Integrative Biology Unit, INRA, Domaine de Vilvert, 78350, Jouy-en-Josas, France.
| |
Collapse
|
45
|
Yu X, Zeng T, Li G. Integrative enrichment analysis: a new computational method to detect dysregulated pathways in heterogeneous samples. BMC Genomics 2015; 16:918. [PMID: 26556243 PMCID: PMC4641376 DOI: 10.1186/s12864-015-2188-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2015] [Accepted: 11/02/2015] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Pathway enrichment analysis is a useful tool to study biology and biomedicine, due to its functional screening on well-defined biological procedures rather than separate molecules. The measurement of malfunctions of pathways with a phenotype change, e.g., from normal to diseased, is the key issue when applying enrichment analysis on a pathway. The differentially expressed genes (DEGs) are widely focused in conventional analysis, which is based on the great purity of samples. However, the disease samples are usually heterogeneous, so that, the genes with great differential expression variance (DEVGs) are becoming attractive and important to indicate the specific state of a biological system. In the context of differential expression variance, it is still a challenge to measure the enrichment or status of a pathway. To address this issue, we proposed Integrative Enrichment Analysis (IEA) based on a novel enrichment measurement. RESULTS The main competitive ability of IEA is to identify dysregulated pathways containing DEGs and DEVGs simultaneously, which are usually under-scored by other methods. Next, IEA provides two additional assistant approaches to investigate such dysregulated pathways. One is to infer the association among identified dysregulated pathways and expected target pathways by estimating pathway crosstalks. The other one is to recognize subtype-factors as dysregulated pathways associated to particular clinical indices according to the DEVGs' relative expressions rather than conventional raw expressions. Based on a previously established evaluation scheme, we found that, in particular cohorts (i.e., a group of real gene expression datasets from human patients), a few target disease pathways can be significantly high-ranked by IEA, which is more effective than other state-of-the-art methods. Furthermore, we present a proof-of-concept study on Diabetes to indicate: IEA rather than conventional ORA or GSEA can capture the under-estimated dysregulated pathways full of DEVGs and DEGs; these newly identified pathways could be significantly linked to prior-known disease pathways by estimated crosstalks; and many candidate subtype-factors recognized by IEA also have significant relation with the risk of subtypes of genotype-phenotype associations. CONCLUSIONS Totally, IEA supplies a new tool to carry on enrichment analysis in the complicate context of clinical application (i.e., heterogeneity of disease), as a necessary complementary and cooperative approach to conventional ones.
Collapse
Affiliation(s)
- Xiangtian Yu
- School of Mathematics, Shandong University, Jinan, 250100, China.
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Cell Building Level 3, YueYang Road 320, Shanghai, 200031, China.
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, 250100, China.
| |
Collapse
|
46
|
Bayerlová M, Jung K, Kramer F, Klemm F, Bleckmann A, Beißbarth T. Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics 2015; 16:334. [PMID: 26489510 PMCID: PMC4618947 DOI: 10.1186/s12859-015-0751-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 09/29/2015] [Indexed: 01/08/2023] Open
Abstract
Background Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. Methods We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. Results In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. Conclusions We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0751-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michaela Bayerlová
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Klaus Jung
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Frank Kramer
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Florian Klemm
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Annalen Bleckmann
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany. .,Department of Hematology and Medical Oncology, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| |
Collapse
|
47
|
Meijer RJ, Goeman JJ. Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls. Brief Bioinform 2015; 17:808-18. [DOI: 10.1093/bib/bbv091] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Indexed: 11/14/2022] Open
|
48
|
Turner JA, Bolen CR, Blankenship DM. Quantitative gene set analysis generalized for repeated measures, confounder adjustment, and continuous covariates. BMC Bioinformatics 2015; 16:272. [PMID: 26316107 PMCID: PMC4551517 DOI: 10.1186/s12859-015-0707-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 08/17/2015] [Indexed: 12/20/2022] Open
Abstract
Background Gene set analysis (GSA) of gene expression data can be highly powerful when the biological signal is weak compared to other sources of variability in the data. However, many gene set analysis approaches utilize permutation tests which are not appropriate for complex study designs. For example, the correlation of subjects is broken when comparing time points within a longitudinal study. Linear mixed models provide a method to analyze longitudinal studies as well as adjust for potential confounding factors and account for sources of variability that are not of primary interest. Currently, there are no known gene set analysis approaches that fully account for these study design and analysis aspects. In order to do so, we generalize the QuSAGE gene set analysis algorithm, denoted Q-Gen, and provide the necessary estimation adjustments to incorporate linear mixed model analyses. Results We assessed the performance of our generalized method in comparison to the original QuSAGE method in settings such as longitudinal repeated measures analysis and accounting for potential confounders. We demonstrate that the original QuSAGE method can not control for type-I error when these complexities exist. In addition to statistical appropriateness, analysis of a longitudinal influenza study suggests Q-Gen can allow for greater sensitivity when exploring a large number of gene sets. Conclusions Q-Gen is an extension to the gene set analysis method of QuSAGE, and allows for linear mixed models to be applied appropriately within a gene set analysis framework. It provides GSA an added layer of flexibility that was not currently available. This flexibility allows for more appropriate statistical modeling of complex data structures that are inherent to many microarray study designs and can provide more sensitivity. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0707-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jacob A Turner
- Baylor Research Institute, 3310 Live Oak, Dallas, 75204, TX, USA.
| | - Christopher R Bolen
- Department of Microbiology and Immunology, Stanford University School, Stanford, 94305, CA, USA.
| | | |
Collapse
|
49
|
Metabolite profiling stratifies pancreatic ductal adenocarcinomas into subtypes with distinct sensitivities to metabolic inhibitors. Proc Natl Acad Sci U S A 2015. [PMID: 26216984 DOI: 10.1073/pnas.1501605112] [Citation(s) in RCA: 231] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Although targeting cancer metabolism is a promising therapeutic strategy, clinical success will depend on an accurate diagnostic identification of tumor subtypes with specific metabolic requirements. Through broad metabolite profiling, we successfully identified three highly distinct metabolic subtypes in pancreatic ductal adenocarcinoma (PDAC). One subtype was defined by reduced proliferative capacity, whereas the other two subtypes (glycolytic and lipogenic) showed distinct metabolite levels associated with glycolysis, lipogenesis, and redox pathways, confirmed at the transcriptional level. The glycolytic and lipogenic subtypes showed striking differences in glucose and glutamine utilization, as well as mitochondrial function, and corresponded to differences in cell sensitivity to inhibitors of glycolysis, glutamine metabolism, lipid synthesis, and redox balance. In PDAC clinical samples, the lipogenic subtype associated with the epithelial (classical) subtype, whereas the glycolytic subtype strongly associated with the mesenchymal (QM-PDA) subtype, suggesting functional relevance in disease progression. Pharmacogenomic screening of an additional ∼ 200 non-PDAC cell lines validated the association between mesenchymal status and metabolic drug response in other tumor indications. Our findings highlight the utility of broad metabolite profiling to predict sensitivity of tumors to a variety of metabolic inhibitors.
Collapse
|
50
|
MacNeil SM, Johnson WE, Li DY, Piccolo SR, Bild AH. Inferring pathway dysregulation in cancers from multiple types of omic data. Genome Med 2015; 7:61. [PMID: 26170901 PMCID: PMC4499940 DOI: 10.1186/s13073-015-0189-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 06/16/2015] [Indexed: 11/10/2022] Open
Abstract
Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.
Collapse
Affiliation(s)
- Shelley M MacNeil
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA
| | - William E Johnson
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
| | - Dean Y Li
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Department of Medicine, University of Utah, Salt Lake City, UT USA
- />Department of Human Genetics, University of Utah, Salt Lake City, UT USA
| | - Stephen R Piccolo
- />Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA
- />Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA USA
- />Department of Biology, Brigham Young University, Provo, UT USA
| | - Andrea H Bild
- />Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA
- />Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA
| |
Collapse
|