1
|
Li S, Zeng Y, He L, Xie X. Exploring Prognostic Immune Microenvironment-Related Genes in Head and Neck Squamous Cell Carcinoma from the TCGA Database. J Cancer 2024; 15:632-644. [PMID: 38213736 PMCID: PMC10777048 DOI: 10.7150/jca.89581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/13/2023] [Indexed: 01/13/2024] Open
Abstract
Purpose: Head and neck squamous cell carcinoma (HNSCC) has a high rate of local and distant metastases. In tumor tissues, the interaction between tumor cells and the tumor microenvironment (TME) is closely related to cancer development and prognosis. Therefore, screening for TME-related genes in HNSCC is crucial for understanding metastatic patterns. Methods: Our research relied mainly on a novel algorithm called Estimation of STromal and Immune cells in MAlignant Tumors using Expression data (ESTIMATE). Fragments Per Kilobase of exon model per Million mapped fragments (FPKM) data and HNSCC clinical data were obtained from the TCGA database, and the purity of HNSCC tissue and the features of stromal and immune cell infiltration were determined. Furthermore, differentially expressed genes (DEGs) were screened based on immune, stromal, and ESTIMATE scores, and their protein-protein interaction (PPI) networks and ClueGO functions were evaluated. Finally, the expression profiles of DEGs related to immunity in HNSCC were determined. Differential gene expression was verified in the highly invasive oral cancer cell lines (SCC-25, CAL-27, and FaDu) and oral cancer tissues. Results: Our analysis found that both the immune and ESTIMATE scores were significantly associated with the prognosis of HNSCC. Moreover, cross-validation using the Venn algorithm revealed that 433 genes were significantly upregulated, and 394 genes were significantly downregulated. All DEGs were associated with both ESTIMATE and immune scores. The enrichment of cytokine-cytokine receptor interactions and chemokine signaling pathways was observed using pathway enrichment analyses. We initially screened 25 genes after analyzing the key sub-networks of the PPI network. Survival analysis revealed the significance of CCR4, CXCR3, P2RY14, CCR2, CCR8, and CCL19 in relation to survival and their association with immune infiltration-related metastasis in HNSCC. Conclusions: The expression profiles of relevant TME-related genes were screened following stromal and immune cell scoring using ESTIMATE, and DEGs associated with survival were identified. These TME-related gene markers offer valuable utility as both prognostic indicators and markers denoting metastatic traits in HNSCC.
Collapse
Affiliation(s)
- Shuangjiang Li
- Department of Stomatology, Changsha Stomatological Hospital, Changsha, P. R. China
| | - Yiyu Zeng
- Department of Stomatology, The Second Xiangya Hospital, Central South University, Changsha, P. R. China
| | - Liming He
- Department of Stomatology, Changsha Stomatological Hospital, Changsha, P. R. China
| | - Xiaoyan Xie
- Department of Stomatology, The Second Xiangya Hospital, Central South University, Changsha, P. R. China
| |
Collapse
|
2
|
Tiwari A, Trivedi R, Lin SY. Tumor microenvironment: barrier or opportunity towards effective cancer therapy. J Biomed Sci 2022; 29:83. [PMID: 36253762 PMCID: PMC9575280 DOI: 10.1186/s12929-022-00866-3] [Citation(s) in RCA: 183] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/01/2022] [Indexed: 12/24/2022] Open
Abstract
Tumor microenvironment (TME) is a specialized ecosystem of host components, designed by tumor cells for successful development and metastasis of tumor. With the advent of 3D culture and advanced bioinformatic methodologies, it is now possible to study TME’s individual components and their interplay at higher resolution. Deeper understanding of the immune cell’s diversity, stromal constituents, repertoire profiling, neoantigen prediction of TMEs has provided the opportunity to explore the spatial and temporal regulation of immune therapeutic interventions. The variation of TME composition among patients plays an important role in determining responders and non-responders towards cancer immunotherapy. Therefore, there could be a possibility of reprogramming of TME components to overcome the widely prevailing issue of immunotherapeutic resistance. The focus of the present review is to understand the complexity of TME and comprehending future perspective of its components as potential therapeutic targets. The later part of the review describes the sophisticated 3D models emerging as valuable means to study TME components and an extensive account of advanced bioinformatic tools to profile TME components and predict neoantigens. Overall, this review provides a comprehensive account of the current knowledge available to target TME.
Collapse
Affiliation(s)
- Aadhya Tiwari
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Rakesh Trivedi
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shiaw-Yih Lin
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
3
|
Shi C, Zhu J, Shen Y, Luo S, Zhu H, Song R. Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2110876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
| | | | - Ye Shen
- North Carolina State University
| | | | - Hongtu Zhu
- University of North Carolina at Chapel Hill
| | | |
Collapse
|
4
|
Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021; 23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Affiliation(s)
- Maria K Jaakkola
- Department of Mathematics and Statistics, University of Turku, Yliopistonmäki, 20014, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520, Turku, Finland
| |
Collapse
|
5
|
Spade DA. A Monte Carlo integration approach to estimating drift and minorization coefficients for Metropolis–Hastings samplers. BRAZ J PROBAB STAT 2021. [DOI: 10.1214/20-bjps486] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- David A. Spade
- Department of Mathematical Sciences, University of Wisconsin–Milwaukee, 3200 N. Cramer Street, EMS E403, Milwaukee, Wisconsin 53211, USA
| |
Collapse
|
6
|
Kang K, Huang C, Li Y, Umbach DM, Li L. CDSeqR: fast complete deconvolution for gene expression data from bulk tissues. BMC Bioinformatics 2021; 22:262. [PMID: 34030626 PMCID: PMC8142515 DOI: 10.1186/s12859-021-04186-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 05/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA.
| | - Caizhi Huang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - David M Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA.
| |
Collapse
|
7
|
Bayesian Joint Modeling of Single-Cell Expression Data and Bulk Spatial Transcriptomic Data. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09308-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Takeuchi F, Kato N. Nonlinear ridge regression improves cell-type-specific differential expression analysis. BMC Bioinformatics 2021; 22:141. [PMID: 33752591 PMCID: PMC7986289 DOI: 10.1186/s12859-021-03982-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 01/27/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity. RESULTS First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data. CONCLUSION Nonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas.
Collapse
Affiliation(s)
- Fumihiko Takeuchi
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan.
| | - Norihiro Kato
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan
| |
Collapse
|
9
|
Amrhein L, Fuchs C. stochprofML: stochastic profiling using maximum likelihood estimation in R. BMC Bioinformatics 2021; 22:123. [PMID: 33722188 PMCID: PMC7958472 DOI: 10.1186/s12859-021-03970-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 01/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tissues are often heterogeneous in their single-cell molecular expression, and this can govern the regulation of cell fate. For the understanding of development and disease, it is important to quantify heterogeneity in a given tissue. RESULTS We present the R package stochprofML which uses the maximum likelihood principle to parameterize heterogeneity from the cumulative expression of small random pools of cells. We evaluate the algorithm's performance in simulation studies and present further application opportunities. CONCLUSION Stochastic profiling outweighs the necessary demixing of mixed samples with a saving in experimental cost and effort and less measurement error. It offers possibilities for parameterizing heterogeneity, estimating underlying pool compositions and detecting differences between cell populations between samples.
Collapse
Affiliation(s)
- Lisa Amrhein
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Mathematics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
| | - Christiane Fuchs
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Mathematics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
- Faculty of Business Administration and Economics, Bielefeld University, Universitätsstrasse 25, 33615 Bielefeld, Germany
| |
Collapse
|
10
|
Jaakkola MK, Elo LL. Computational deconvolution to estimate cell type-specific gene expression from bulk data. NAR Genom Bioinform 2021; 3:lqaa110. [PMID: 33575652 PMCID: PMC7803005 DOI: 10.1093/nargab/lqaa110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/24/2022] Open
Abstract
Computational deconvolution is a time and cost-efficient approach to obtain cell type-specific information from bulk gene expression of heterogeneous tissues like blood. Deconvolution can aim to either estimate cell type proportions or abundances in samples, or estimate how strongly each present cell type expresses different genes, or both tasks simultaneously. Among the two separate goals, the estimation of cell type proportions/abundances is widely studied, but less attention has been paid on defining the cell type-specific expression profiles. Here, we address this gap by introducing a novel method Rodeo and empirically evaluating it and the other available tools from multiple perspectives utilizing diverse datasets.
Collapse
Affiliation(s)
- Maria K Jaakkola
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| |
Collapse
|
11
|
Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X. Deconvolution of heterogeneous tumor samples using partial reference signals. PLoS Comput Biol 2020; 16:e1008452. [PMID: 33253170 PMCID: PMC7728196 DOI: 10.1371/journal.pcbi.1008452] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/10/2020] [Accepted: 10/19/2020] [Indexed: 12/16/2022] Open
Abstract
Deconvolution of heterogeneous bulk tumor samples into distinct cellular populations is an important yet challenging problem, particularly when only partial references are available. A common approach to dealing with this problem is to deconvolve the mixed signals using available references and leverage the remaining signal as a new cell component. However, as indicated in our simulation, such an approach tends to over-estimate the proportions of known cell types and fails to detect novel cell types. Here, we propose PREDE, a partial reference-based deconvolution method using an iterative non-negative matrix factorization algorithm. Our method is verified to be effective in estimating cell proportions and expression profiles of unknown cell types based on simulated datasets at a variety of parameter settings. Applying our method to TCGA tumor samples, we found that proportions of pure cancer cells better indicate different subtypes of tumor samples. We also detected several cell types for each cancer type whose proportions successfully predicted patient survival. Our method makes a significant contribution to deconvolution of heterogeneous tumor samples and could be widely applied to varieties of high throughput bulk data. PREDE is implemented in R and is freely available from GitHub (https://xiaoqizheng.github.io/PREDE). Tumor tissues are mixtures of different cell types. Identification and quantification of constitutional cell types within tumor tissues are important tasks in cancer research. The problem can be readily solved using regression-based methods if reference signals are available. But in most clinical applications, only partial references are available, which significantly reduces the deconvolution accuracy of the existing regression-based methods. In this paper, we propose a partial-reference based deconvolution model, PREDE, integrating the non-negative matrix factorization framework with an iterative optimization strategy. We conducted comprehensive evaluations for PREDE using both simulation and real data analyses, demonstrating better performance of our method than other existing methods.
Collapse
Affiliation(s)
- Yufang Qin
- College of Information Technology, Shanghai Ocean University, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Weiwei Zhang
- School of Science, East China University of Technology, Nanchang, Jiangxi, China
| | - Xiaoqiang Sun
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Siwei Nan
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Nana Wei
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Hua-Jun Wu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
- * E-mail:
| |
Collapse
|
12
|
Devaraj V, Bose B. DEBay: A computational tool for deconvolution of quantitative PCR data for estimation of cell type-specific gene expression in a mixed population. Heliyon 2020; 6:e04489. [PMID: 32728643 PMCID: PMC7381708 DOI: 10.1016/j.heliyon.2020.e04489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 07/12/2020] [Accepted: 07/14/2020] [Indexed: 11/30/2022] Open
Abstract
The expression of a gene is commonly estimated by quantitative PCR (qPCR) using RNA isolated from a large number of pooled cells. Such pooled samples often have subpopulations of cells with different levels of expression of the target gene. Estimation of gene expression from an ensemble of cells obscures the pattern of expression in different subpopulations. Physical separation of various subpopulations is a demanding task. We have developed a computational tool, Deconvolution of Ensemble through Bayes-approach (DEBay), to estimate cell type-specific gene expression from qPCR data of a mixed population. DEBay estimates Normalized Gene Expression Coefficient (NGEC), which is a relative measure of the expression of the target gene in each cell type in a population. NGEC has a direct algebraic correspondence with the normalized fold change in gene expression measured by qPCR. DEBay can deconvolute both time-dependent and -independent gene expression profiles. It uses the Bayesian method of model selection and parameter estimation. We have evaluated DEBay using synthetic and real experimental data. DEBay is implemented in Python. A GUI of DEBay and its source code are available for download at SourceForge (https://sourceforge.net/projects/debay).
Collapse
Affiliation(s)
- Vimalathithan Devaraj
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
| | - Biplab Bose
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
| |
Collapse
|
13
|
Li Z, Wu Z, Jin P, Wu H. Dissecting differential signals in high-throughput data from complex tissues. Bioinformatics 2020; 35:3898-3905. [PMID: 30903684 DOI: 10.1093/bioinformatics/btz196] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 03/08/2019] [Accepted: 03/20/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Samples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. RESULTS We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented as an R package and is freely available on GitHub (https://github.com/ziyili20/TOAST). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| |
Collapse
|
14
|
Li H, Sharma A, Luo K, Qin ZS, Sun X, Liu H. DeconPeaker, a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples. Front Genet 2020; 11:392. [PMID: 32547592 PMCID: PMC7269180 DOI: 10.3389/fgene.2020.00392] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/30/2020] [Indexed: 12/26/2022] Open
Abstract
While our understanding of cellular and molecular processes has grown exponentially, issues related to the cell microenvironment and cellular heterogeneity have sparked a new debate concerning the cell identity. Cell composition (chromatin and nuclear architecture) poses a strong risk for dynamic changes in the diseased condition. Since chromatin accessibility patterns play a major role in human diseases, it is therefore anticipated that a deconvolution tool based on open chromatin data will provide better performance in identifying cell composition. Herein, we have designed the deconvolution tool "DeconPeaker," which can precisely define the uniqueness among subpopulations of cells using open chromatin datasets. Using this tool, we simultaneously evaluated chromatin accessibility and gene expression datasets to estimate cell types and their respective proportions in a mixture of samples. In comparison to other known deconvolution methods, we observed the lowest average root-mean-square error (RMSE = 0.042) and the highest average correlation coefficient (r = 0.919) between the prediction and "true" proportion. As a proof-of-concept, we also tested chromatin accessibility data from acute myeloid leukemia (AML) and successfully obtained unique cell types associated with AML progression. Furthermore, we showed that chromatin accessibility represents more essential characteristics in the identification of cell types than gene expression. Taken together, DeconPeaker as a powerful tool has the potential to combine different datasets (primarily, chromatin accessibility and gene expression) and define different cell types in mixtures. The Python package of DeconPeaker is now available at https://github.com/lihuamei/DeconPeaker.
Collapse
Affiliation(s)
- Huamei Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Amit Sharma
- Department of Ophthalmology, University Hospital Bonn, Bonn, Germany
| | - Kun Luo
- Department of Neurosurgery, Xinjiang Evidence-Based Medicine Research Institute, First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
15
|
Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019; 15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open
Abstract
Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq. Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| | - Qian Meng
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - David M. Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Melissa Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| |
Collapse
|
16
|
Danziger SA, Gibbs DL, Shmulevich I, McConnell M, Trotter MWB, Schmitz F, Reiss DJ, Ratushny AV. ADAPTS: Automated deconvolution augmentation of profiles for tissue specific cells. PLoS One 2019; 14:e0224693. [PMID: 31743345 PMCID: PMC6863530 DOI: 10.1371/journal.pone.0224693] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 10/18/2019] [Indexed: 12/19/2022] Open
Abstract
Immune cell infiltration of tumors and the tumor microenvironment can be an important component for determining patient outcomes. For example, immune and stromal cell presence inferred by deconvolving patient gene expression data may help identify high risk patients or suggest a course of treatment. One particularly powerful family of deconvolution techniques uses signature matrices of genes that uniquely identify each cell type as determined from single cell type purified gene expression data. Many methods from this family have been recently published, often including new signature matrices appropriate for a single purpose, such as investigating a specific type of tumor. The package ADAPTS helps users make the most of this expanding knowledge base by introducing a framework for cell type deconvolution. ADAPTS implements modular tools for customizing signature matrices for new tissue types by adding custom cell types or building new matrices de novo, including from single cell RNAseq data. It includes a common interface to several popular deconvolution algorithms that use a signature matrix to estimate the proportion of cell types present in heterogenous samples. ADAPTS also implements a novel method for clustering cell types into groups that are difficult to distinguish by deconvolution and then re-splitting those clusters using hierarchical deconvolution. We demonstrate that the techniques implemented in ADAPTS improve the ability to reconstruct the cell types present in a single cell RNAseq data set in a blind predictive analysis. ADAPTS is currently available for use in R on CRAN and GitHub.
Collapse
Affiliation(s)
- Samuel A. Danziger
- Celgene Corporation, Seattle, Washington, United States of America
- * E-mail: (SAD); (AVR)
| | - David L. Gibbs
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Ilya Shmulevich
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Mark McConnell
- Celgene Corporation, Seattle, Washington, United States of America
| | - Matthew W. B. Trotter
- Celgene Corporation, Seattle, Washington, United States of America
- Celgene Institute for Translational Research Europe, Seville, Sevilla, Spain
| | - Frank Schmitz
- Celgene Corporation, Seattle, Washington, United States of America
| | - David J. Reiss
- Celgene Corporation, Seattle, Washington, United States of America
| | - Alexander V. Ratushny
- Celgene Corporation, Seattle, Washington, United States of America
- * E-mail: (SAD); (AVR)
| |
Collapse
|
17
|
Petralia F, Wang L, Peng J, Yan A, Zhu J, Wang P. A new method for constructing tumor specific gene co-expression networks based on samples with tumor purity heterogeneity. Bioinformatics 2019; 34:i528-i536. [PMID: 29949994 PMCID: PMC6022554 DOI: 10.1093/bioinformatics/bty280] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation Tumor tissue samples often contain an unknown fraction of stromal cells. This problem is widely known as tumor purity heterogeneity (TPH) was recently recognized as a severe issue in omics studies. Specifically, if TPH is ignored when inferring co-expression networks, edges are likely to be estimated among genes with mean shift between non-tumor- and tumor cells rather than among gene pairs interacting with each other in tumor cells. To address this issue, we propose Tumor Specific Net (TSNet), a new method which constructs tumor-cell specific gene/protein co-expression networks based on gene/protein expression profiles of tumor tissues. TSNet treats the observed expression profile as a mixture of expressions from different cell types and explicitly models tumor purity percentage in each tumor sample. Results Using extensive synthetic data experiments, we demonstrate that TSNet outperforms a standard graphical model which does not account for TPH. We then apply TSNet to estimate tumor specific gene co-expression networks based on TCGA ovarian cancer RNAseq data. We identify novel co-expression modules and hub structure specific to tumor cells. Availability and implementation R codes can be found at https://github.com/petraf01/TSNet. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Petralia
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Li Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Sema4, a Mount Sinai Venture, Stamford, CT, USA
| | - Jie Peng
- Department of Statistics, University of California, Davis, Davis, CA, USA
| | - Arthur Yan
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jun Zhu
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Sema4, a Mount Sinai Venture, Stamford, CT, USA
| | - Pei Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
18
|
Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 2019; 34:1969-1979. [PMID: 29351586 DOI: 10.1093/bioinformatics/bty019] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open
Abstract
Summary Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. In this review, we highlight the importance and value of computational deconvolution methods to infer the abundance of different cell types and/or cell type-specific expression profiles in heterogeneous samples without performing physical cell sorting. We also explain the various deconvolution scenarios, the mathematical approaches used to solve them and the effect of data processing and different confounding factors on the accuracy of the deconvolution results. Contact katleen.depreter@ugent.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francisco Avila Cobos
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Katleen De Preter
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| |
Collapse
|
19
|
Boufaied N, Takhar M, Nash C, Erho N, Bismar TA, Davicioni E, Thomson AA. Development of a predictive model for stromal content in prostate cancer samples to improve signature performance. J Pathol 2019; 249:411-424. [PMID: 31206668 PMCID: PMC6900085 DOI: 10.1002/path.5315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 05/27/2019] [Accepted: 06/13/2019] [Indexed: 01/23/2023]
Abstract
Prostate cancer is heterogeneous in both cellular composition and patient outcome, and development of biomarker signatures to distinguish indolent from aggressive tumours is a high priority. Stroma plays an important role during prostate cancer progression and undergoes histological and transcriptional changes associated with disease. However, identification and validation of stromal markers is limited by a lack of datasets with defined stromal/tumour ratio. We have developed a prostate‐selective signature to estimate the stromal content in cancer samples of mixed cellular composition. We identified stromal‐specific markers from transcriptomic datasets of developmental prostate mesenchyme and prostate cancer stroma. These were experimentally validated in cell lines, datasets of known stromal content, and by immunohistochemistry in tissue samples to verify stromal‐specific expression. Linear models based upon six transcripts were able to infer the stromal content and estimate stromal composition in mixed tissues. The best model had a coefficient of determination R2 of 0.67. Application of our stromal content estimation model in various prostate cancer datasets led to improved performance of stromal predictive signatures for disease progression and metastasis. The stromal content of prostate tumours varies considerably; consequently, deconvolution of stromal proportion may yield better results than tumour cell deconvolution. We suggest that adjusting expression data for cell composition will improve stromal signature performance and lead to better prognosis and stratification of men with prostate cancer. © 2019 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Nadia Boufaied
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| | - Mandeep Takhar
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Claire Nash
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| | - Nicholas Erho
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Tarek A Bismar
- Department of Pathology and Laboratory Medicine, University of Calgary Cumming School of Medicine, Calgary, Canada.,Department of Oncology, Biochemistry and Molecular Biology, University of Calgary Cumming School of Medicine, Calgary, Canada
| | - Elai Davicioni
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Axel A Thomson
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| |
Collapse
|
20
|
Rombaut D, Chiu HS, Decaesteker B, Everaert C, Yigit N, Peltier A, Janoueix-Lerosey I, Bartenhagen C, Fischer M, Roberts S, D'Haene N, De Preter K, Speleman F, Denecker G, Sumazin P, Vandesompele J, Lefever S, Mestdagh P. Integrative analysis identifies lincRNAs up- and downstream of neuroblastoma driver genes. Sci Rep 2019; 9:5685. [PMID: 30952905 PMCID: PMC6451017 DOI: 10.1038/s41598-019-42107-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 03/20/2019] [Indexed: 12/13/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are emerging as integral components of signaling pathways in various cancer types. In neuroblastoma, only a handful of lincRNAs are known as upstream regulators or downstream effectors of oncogenes. Here, we exploit RNA sequencing data of primary neuroblastoma tumors, neuroblast precursor cells, neuroblastoma cell lines and various cellular perturbation model systems to define the neuroblastoma lincRNome and map lincRNAs up- and downstream of neuroblastoma driver genes MYCN, ALK and PHOX2B. Each of these driver genes controls the expression of a particular subset of lincRNAs, several of which are associated with poor survival and are differentially expressed in neuroblastoma tumors compared to neuroblasts. By integrating RNA sequencing data from both primary tumor tissue and cancer cell lines, we demonstrate that several of these lincRNAs are expressed in stromal cells. Deconvolution of primary tumor gene expression data revealed a strong association between stromal cell composition and driver gene status, resulting in differential expression of these lincRNAs. We also explored lincRNAs that putatively act upstream of neuroblastoma driver genes, either as presumed modulators of driver gene activity, or as modulators of effectors regulating driver gene expression. This analysis revealed strong associations between the neuroblastoma lincRNAs MIAT and MEG3 and MYCN and PHOX2B activity or expression. Together, our results provide a comprehensive catalogue of the neuroblastoma lincRNome, highlighting lincRNAs up- and downstream of key neuroblastoma driver genes. This catalogue forms a solid basis for further functional validation of candidate neuroblastoma lincRNAs.
Collapse
Affiliation(s)
- Dries Rombaut
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Hua-Sheng Chiu
- Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Bieke Decaesteker
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Celine Everaert
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Nurten Yigit
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Agathe Peltier
- Institut Curie, PSL Research University, Inserm U830, Equipe Labellisée contre le Cancer, F-75005, Paris, France.,SIREDO: Care, Innovation and Research for Children, Adolescents and Young Adults with Cancer, Institut Curie, F-75005, Paris, France
| | - Isabelle Janoueix-Lerosey
- Institut Curie, PSL Research University, Inserm U830, Equipe Labellisée contre le Cancer, F-75005, Paris, France.,SIREDO: Care, Innovation and Research for Children, Adolescents and Young Adults with Cancer, Institut Curie, F-75005, Paris, France
| | - Christoph Bartenhagen
- Department of Experimental Pediatric Oncology, University Children's Hospital of Cologne, Medical Faculty, University of Cologne, 50937, Cologne, Germany
| | - Matthias Fischer
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, 50931, Cologne, Germany.,Department of Experimental Pediatric Oncology, University Children's Hospital of Cologne, Medical Faculty, University of Cologne, 50937, Cologne, Germany
| | - Stephen Roberts
- Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nicky D'Haene
- Hôpital Erasme, Cliniques Universitaires de Bruxelles, Bruxelles, 1070, Belgium
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Frank Speleman
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Geertrui Denecker
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Pavel Sumazin
- Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jo Vandesompele
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Steve Lefever
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium. .,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium.
| |
Collapse
|
21
|
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018; 19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes. RESULTS We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA). CONCLUSION Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).
Collapse
Affiliation(s)
- Konstantina Dimitrakopoulou
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Lars A Akslen
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway. .,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
22
|
Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018; 34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]
Abstract
Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Raman Arora
- Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Aedin C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Alexander V Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
| | - Loyal A Goff
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifeng Li
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
| | - Aloune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
23
|
Dou H, Fang Y, Zheng X. Universal informative CpG sites for inferring tumor purity from DNA methylation microarray data. J Bioinform Comput Biol 2018; 16:1750030. [PMID: 29347875 DOI: 10.1142/s0219720017500305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Tumor purity is an intrinsic property of tumor samples and potentially has severe impact on many types of data analysis. We have previously developed a statistical method, InfiniumPurify, which could infer purity of a tumor sample given its tumor type (available in TCGA) or a set of informative CpG (iDMC) sites. However, in many clinical practices, researchers may focus on a specific type of tumor samples that is not included in TCGA, and samples which are too few to identify reliable iDMCs. This greatly restricts the application of InfiniumPurify in cancer research. In this paper, we proposed an updated version of InfiniumPurify (termed as uiInfiniumPurify) through identifying a universal set of iDMCs (uiDMCs) and redesigning the algorithm to determine hyper- and hypo-methylation status of each uiDMC. Through the application, we estimated tumor purities of 8830 tumor samples from TCGA. Result shows that our estimates are highly consistent with those by other available methods. Consequently, the updated uiInfiniumPurify, can be applied to a single sample (or a few samples) of interest whose tumor type is not included in TCGA. This characteristic will greatly broaden the application of uiInfiniumPurify in cancer research.
Collapse
Affiliation(s)
- Haixia Dou
- 1 Department of Mathematics, Shanghai Normal University, Shanghai 200234, P. R. China
| | - Yun Fang
- 1 Department of Mathematics, Shanghai Normal University, Shanghai 200234, P. R. China
| | - Xiaoqi Zheng
- 1 Department of Mathematics, Shanghai Normal University, Shanghai 200234, P. R. China
| |
Collapse
|
24
|
Gogolewski K, Wronowska W, Lech A, Lesyng B, Gambin A. Inferring Molecular Processes Heterogeneity from Transcriptional Data. BIOMED RESEARCH INTERNATIONAL 2017; 2017:6961786. [PMID: 29362714 PMCID: PMC5736944 DOI: 10.1155/2017/6961786] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 09/23/2017] [Accepted: 10/08/2017] [Indexed: 12/01/2022]
Abstract
RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs.
Collapse
Affiliation(s)
- Krzysztof Gogolewski
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Weronika Wronowska
- Faculty of Biology, University of Warsaw, Miecznikowa 1, 02-096 Warsaw, Poland
| | - Agnieszka Lech
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Bogdan Lesyng
- Bioinformatics Laboratory, Mossakowski Medical Research Centre, Polish Academy of Sciences, Pawińskiego 5, 02-106 Warsaw, Poland
| | - Anna Gambin
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
25
|
Ogundijo OE, Wang X. A sequential Monte Carlo approach to gene expression deconvolution. PLoS One 2017; 12:e0186167. [PMID: 29049343 PMCID: PMC5648148 DOI: 10.1371/journal.pone.0186167] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 09/26/2017] [Indexed: 01/06/2023] Open
Abstract
High-throughput gene expression data are often obtained from pure or complex (heterogeneous) biological samples. In the latter case, data obtained are a mixture of different cell types and the heterogeneity imposes some difficulties in the analysis of such data. In order to make conclusions on gene expresssion data obtained from heterogeneous samples, methods such as microdissection and flow cytometry have been employed to physically separate the constituting cell types. However, these manual approaches are time consuming when measuring the responses of multiple cell types simultaneously. In addition, exposed samples, on many occasions, end up being contaminated with external perturbations and this may result in an altered yield of molecular content. In this paper, we model the heterogeneous gene expression data using a Bayesian framework, treating the cell type proportions and the cell-type specific expressions as the parameters of the model. Specifically, we present a novel sequential Monte Carlo (SMC) sampler for estimating the model parameters by approximating their posterior distributions with a set of weighted samples. The SMC framework is a robust and efficient approach where we construct a sequence of artificial target (posterior) distributions on spaces of increasing dimensions which admit the distributions of interest as marginals. The proposed algorithm is evaluated on simulated datasets and publicly available real datasets, including Affymetrix oligonucleotide arrays and national center for biotechnology information (NCBI) gene expression omnibus (GEO), with varying number of cell types. The results obtained on all datasets show a superior performance with an improved accuracy in the estimation of cell type proportions and the cell-type specific expressions, and in addition, more accurate identification of differentially expressed genes when compared to other widely known methods for blind decomposition of heterogeneous gene expression data such as Dsection and the nonnegative matrix factorization (NMF) algorithms. MATLAB implementation of the proposed SMC algorithm is available to download at https://github.com/moyanre/smcgenedeconv.git.
Collapse
Affiliation(s)
- Oyetunji E. Ogundijo
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
26
|
Zheng X, Zhang N, Wu HJ, Wu H. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol 2017; 18:17. [PMID: 28122605 PMCID: PMC5267453 DOI: 10.1186/s13059-016-1143-5] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 12/20/2016] [Indexed: 01/03/2023] Open
Abstract
We present a set of statistical methods for the analysis of DNA methylation microarray data, which account for tumor purity. These methods are an extension of our previously developed method for purity estimation; our updated method is flexible, efficient, and does not require data from reference samples or matched normal controls. We also present a method for incorporating purity information for differential methylation analysis. In addition, we propose a control-free differential methylation calling method when normal controls are not available. Extensive analyses of TCGA data demonstrate that our methods provide accurate results. All methods are implemented in InfiniumPurify.
Collapse
Affiliation(s)
- Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China.
| | - Naiqian Zhang
- Department of Mathematics, Weifang University, Weifang, Shandong, 261061, China
| | - Hua-Jun Wu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA, 02215, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road, Atlanta, Georgia, 30322, USA.
| |
Collapse
|
27
|
Glass ER, Dozmorov MG. Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold. BMC Bioinformatics 2016; 17:334. [PMID: 27766949 PMCID: PMC5073979 DOI: 10.1186/s12859-016-1226-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. Results We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. Conclusions The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions of target cell (cell type being analyzed). We demonstrate that LRCDE, which uses Welch’s t-test to compare per-gene cell type-specific gene expression estimates, is more sensitive in detecting cell type-specific differential expression at α < 0.05 missed by the global false discovery rate threshold FDR < 0.3. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1226-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Edmund R Glass
- Department of Biostatistics, Virginia Commonwealth University, School of Medicine, PO Box 980032, Richmond, VA, 23298, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, School of Medicine, PO Box 980032, Richmond, VA, 23298, USA.
| |
Collapse
|
28
|
Houseman EA, Kile ML, Christiani DC, Ince TA, Kelsey KT, Marsit CJ. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformatics 2016; 17:259. [PMID: 27358049 PMCID: PMC4928286 DOI: 10.1186/s12859-016-1140-4] [Citation(s) in RCA: 171] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 06/19/2016] [Indexed: 12/28/2022] Open
Abstract
Background Recent interest in reference-free deconvolution of DNA methylation data has led to several supervised methods, but these methods do not easily permit the interpretation of underlying cell types. Results We propose a simple method for reference-free deconvolution that provides both proportions of putative cell types defined by their underlying methylomes, the number of these constituent cell types, as well as a method for evaluating the extent to which the underlying methylomes reflect specific types of cells. We demonstrate these methods in an analysis of 23 Infinium data sets from 13 distinct data collection efforts; these empirical evaluations show that our algorithm can reasonably estimate the number of constituent types, return cell proportion estimates that demonstrate anticipated associations with underlying phenotypic data; and methylomes that reflect the underlying biology of constituent cell types. Conclusions Our methodology permits an explicit quantitation of the mediation of phenotypic associations with DNA methylation by cell composition effects. Although more work is needed to investigate functional information related to estimated methylomes, our proposed method provides a novel and useful foundation for conducting DNA methylation studies on heterogeneous tissues lacking reference data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1140-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- E Andres Houseman
- School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA.
| | - Molly L Kile
- School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - David C Christiani
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Tan A Ince
- Department of Pathology, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Karl T Kelsey
- Department of Epidemiology, Department of Pathology and Laboratory Medicine, Brown University, Providence, USA
| | - Carmen J Marsit
- Department of Community and Family Medicine, Dartmouth Medical School, Hanover, NH, USA
| |
Collapse
|
29
|
Reinartz S, Finkernagel F, Adhikary T, Rohnalter V, Schumann T, Schober Y, Nockher WA, Nist A, Stiewe T, Jansen JM, Wagner U, Müller-Brüsselbach S, Müller R. A transcriptome-based global map of signaling pathways in the ovarian cancer microenvironment associated with clinical outcome. Genome Biol 2016; 17:108. [PMID: 27215396 PMCID: PMC4877997 DOI: 10.1186/s13059-016-0956-6] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 04/15/2016] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Soluble protein and lipid mediators play essential roles in the tumor environment, but their cellular origins, targets, and clinical relevance are only partially known. We have addressed this question for the most abundant cell types in human ovarian carcinoma ascites, namely tumor cells and tumor-associated macrophages. RESULTS Transcriptome-derived datasets were adjusted for errors caused by contaminating cell types by an algorithm using expression data derived from pure cell types as references. These data were utilized to construct a network of autocrine and paracrine signaling pathways comprising 358 common and 58 patient-specific signaling mediators and their receptors. RNA sequencing based predictions were confirmed for several proteins and lipid mediators. Published expression microarray results for 1018 patients were used to establish clinical correlations for a number of components with distinct cellular origins and target cells. Clear associations with early relapse were found for STAT3-inducing cytokines, specific components of WNT and fibroblast growth factor signaling, ephrin and semaphorin axon guidance molecules, and TGFβ/BMP-triggered pathways. An association with early relapse was also observed for secretory macrophage-derived phospholipase PLA2G7, its product arachidonic acid (AA) and signaling pathways controlled by the AA metabolites PGE2, PGI2, and LTB4. By contrast, the genes encoding norrin and its receptor frizzled 4, both selectively expressed by cancer cells and previously not linked to tumor suppression, show a striking association with a favorable clinical course. CONCLUSIONS We have established a signaling network operating in the ovarian cancer microenvironment with previously unidentified pathways and have defined clinically relevant components within this network.
Collapse
Affiliation(s)
- Silke Reinartz
- Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Florian Finkernagel
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Till Adhikary
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Verena Rohnalter
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Tim Schumann
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Yvonne Schober
- Metabolomics Core Facility and Institute of Laboratory Medicine and Pathobiochemistry, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - W Andreas Nockher
- Metabolomics Core Facility and Institute of Laboratory Medicine and Pathobiochemistry, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Andrea Nist
- Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Thorsten Stiewe
- Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Julia M Jansen
- Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Uwe Wagner
- Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Sabine Müller-Brüsselbach
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Rolf Müller
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany.
| |
Collapse
|
30
|
Wang F, Zhang N, Wang J, Wu H, Zheng X. Tumor purity and differential methylation in cancer epigenomics. Brief Funct Genomics 2016; 15:408-419. [PMID: 27199459 DOI: 10.1093/bfgp/elw016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
DNA methylation is an epigenetic modification of DNA molecule that plays a vital role in gene expression regulation. It is not only involved in many basic biological processes, but also considered an important factor for tumorigenesis and other human diseases. Study of DNA methylation has been an active field in cancer epigenomics research. With the advances of high-throughput technologies and the accumulation of enormous amount of data, method development for analyzing these data has gained tremendous interests in the fields of computational biology and bioinformatics. In this review, we systematically summarize the recent developments of computational methods and software tools in high-throughput methylation data analysis with focus on two aspects: differential methylation analysis and tumor purity estimation in cancer studies.
Collapse
|
31
|
Gabitto MI, Pakman A, Bikoff JB, Abbott LF, Jessell TM, Paninski L. Bayesian Sparse Regression Analysis Documents the Diversity of Spinal Inhibitory Interneurons. Cell 2016; 165:220-233. [PMID: 26949187 DOI: 10.1016/j.cell.2016.01.026] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 11/30/2015] [Accepted: 01/15/2016] [Indexed: 12/14/2022]
Abstract
Documenting the extent of cellular diversity is a critical step in defining the functional organization of tissues and organs. To infer cell-type diversity from partial or incomplete transcription factor expression data, we devised a sparse Bayesian framework that is able to handle estimation uncertainty and can incorporate diverse cellular characteristics to optimize experimental design. Focusing on spinal V1 inhibitory interneurons, for which the spatial expression of 19 transcription factors has been mapped, we infer the existence of ~50 candidate V1 neuronal types, many of which localize in compact spatial domains in the ventral spinal cord. We have validated the existence of inferred cell types by direct experimental measurement, establishing this Bayesian framework as an effective platform for cell-type characterization in the nervous system and elsewhere.
Collapse
Affiliation(s)
- Mariano I Gabitto
- Department of Neuroscience, Columbia University, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Kavli Institute for Brain Science, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10032, USA.
| | - Ari Pakman
- Department of Statistics and Grossman Center for the Statistics of Mind, Columbia University, New York, NY 10027, USA
| | - Jay B Bikoff
- Department of Neuroscience, Columbia University, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Kavli Institute for Brain Science, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10032, USA
| | - L F Abbott
- Department of Neuroscience, Columbia University, New York, NY 10032, USA; Department of Physiology and Cellular Biophysics, Columbia University, New York, NY 10032, USA
| | - Thomas M Jessell
- Department of Neuroscience, Columbia University, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Kavli Institute for Brain Science, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10032, USA
| | - Liam Paninski
- Department of Neuroscience, Columbia University, New York, NY 10032, USA; Department of Statistics and Grossman Center for the Statistics of Mind, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
32
|
Rautio S, Lähdesmäki H. MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis. BMC Bioinformatics 2015; 16:413. [PMID: 26703974 PMCID: PMC4690251 DOI: 10.1186/s12859-015-0834-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 11/24/2015] [Indexed: 08/30/2023] Open
Abstract
Background Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression. To understand details of gene regulation, characterizing TF binding sites in different cell types, diseases and among individuals is essential. However, sometimes TF binding can only be measured from biological samples that contain multiple cell or tissue types. Sample heterogeneity can have a considerable effect on TF binding site detection. While manual separation techniques can be used to isolate a cell type of interest from heterogeneous samples, such techniques are challenging and can change intra-cellular interactions, including protein-DNA binding. Computational deconvolution methods have emerged as an alternative strategy to study heterogeneous samples and numerous methods have been proposed to analyze gene expression. However, no computational method exists to deconvolve cell type specific TF binding from heterogeneous samples. Results We present a probabilistic method, MixChIP, to identify cell type specific TF binding sites from heterogeneous chromatin immunoprecipitation sequencing (ChIP-seq) data. Our method simultaneously estimates the binding strength in different cell types as well as the proportions of different cell types in each sample when only partial prior information about cell type composition is available. We demonstrate the utility of MixChIP by analyzing ChIP-seq data from two cell lines which we artificially mix to generate (simulated) heterogeneous samples and by analyzing ChIP-seq data from breast cancer patients measuring oestrogen receptor (ER) binding in primary breast cancer tissues. We show that MixChIP is more accurate in detecting TF binding sites from multiple heterogeneous ChIP-seq samples than the standard methods which do not account for sample heterogeneity. Conclusions Our results show that MixChIP can estimate cell-type proportions and identify cell type specific TF binding sites from heterogeneous ChIP-seq samples. Thus, MixChIP can be an invaluable tool in analyzing heterogeneous ChIP-seq samples, such as those originating from cancer studies. R implementation is available at http://research.ics.aalto.fi/csb/software/mixchip/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0834-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sini Rautio
- Department of Computer Science, Aalto University, Aalto, FI-00076, Finland.
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Aalto, FI-00076, Finland.
| |
Collapse
|
33
|
The influence of cancer tissue sampling on the identification of cancer characteristics. Sci Rep 2015; 5:15474. [PMID: 26490514 PMCID: PMC4614546 DOI: 10.1038/srep15474] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 09/24/2015] [Indexed: 12/21/2022] Open
Abstract
Cancer tissue sampling affects the identification of cancer characteristics. We aimed to clarify the source of differentially expressed genes (DEGs) in macro-dissected cancer tissue and develop a robust prognostic signature against the effects of tissue sampling. For estrogen receptor (ER)+ breast cancer patients, we identified DEGs in macro-dissected cancer tissues, malignant epithelial cells and stromal cells, defined as Macro-Dissected-DEGs, Epithelial-DEGs and Stromal-DEGs, respectively. Comparing Epithelial-DEGs to Stromal-DEGs (false discovery rate (FDR) < 10%), 86% of the overlapping genes exhibited consistent dysregulation (defined as Consistent-DEGs), and the other 14% of genes were dysregulated inconsistently (defined as Inconsistent-DEGs). The consistency score of dysregulation directions between Macro-Dissected-DEGs and Consistent-DEGs was 91% (P-value < 2.2 × 10−16, binomial test), whereas the score was only 52% between Macro-Dissected-DEGs and Inconsistent-DEGs (P-value = 0.9, binomial test). Among the gene ontology (GO) terms significantly enriched in Macro-Dissected-DEGs (FDR < 10%), 18 immune-related terms were enriched in Inconsistent-DEGs. DEGs associated with proliferation could reflect common changes of malignant epithelial and stromal cells; DEGs associated with immune functions are sensitive to the percentage of malignant epithelial cells in macro-dissected tissues. A prognostic signature which was insensitive to the cellular composition of macro-dissected tissues was developed and validated for ER+ breast patients.
Collapse
|
34
|
Dozmorov MG, Dominguez N, Bean K, Macwana SR, Roberts V, Glass E, James JA, Guthridge JM. B-Cell and Monocyte Contribution to Systemic Lupus Erythematosus Identified by Cell-Type-Specific Differential Expression Analysis in RNA-Seq Data. Bioinform Biol Insights 2015; 9:11-9. [PMID: 26512198 PMCID: PMC4599594 DOI: 10.4137/bbi.s29470] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Revised: 08/24/2015] [Accepted: 08/26/2015] [Indexed: 12/18/2022] Open
Abstract
Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Nicolas Dominguez
- Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Krista Bean
- Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Susan R Macwana
- Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Virginia Roberts
- Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Edmund Glass
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Judith A James
- Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Joel M Guthridge
- Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| |
Collapse
|
35
|
Anghel CV, Quon G, Haider S, Nguyen F, Deshwar AG, Morris QD, Boutros PC. ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics 2015; 16:156. [PMID: 25972088 PMCID: PMC4429941 DOI: 10.1186/s12859-015-0597-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/27/2015] [Indexed: 01/23/2023] Open
Abstract
Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0597-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Catalina V Anghel
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada.
| | - Gerald Quon
- Department of Computer Science, University of Toronto, 10 King's College Road, Room 3303, M5S 3G4, Toronto, ON, Canada.
| | - Syed Haider
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada. .,Department of Oncology, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, United Kingdom.
| | - Francis Nguyen
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada.
| | - Amit G Deshwar
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College, Room SFB540, Toronto, M5S 3G4, ON, Canada.
| | - Quaid D Morris
- Department of Computer Science, University of Toronto, 10 King's College Road, Room 3303, M5S 3G4, Toronto, ON, Canada. .,Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College, Room SFB540, Toronto, M5S 3G4, ON, Canada. .,Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Room 4396, Toronto, M4S 1A8, ON, Canada. .,The Donnelly Centre, 160 College Street, Room 230, Toronto, M5S 3E1, ON, Canada.
| | - Paul C Boutros
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada. .,Department of Medical Biophysics, University of Toronto, 101 College Street, Toronto, M5G 1L7, ON, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, 1 King's College Circle, Toronto, M5S 1A8, ON, Canada.
| |
Collapse
|
36
|
Abstract
RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.
Collapse
Affiliation(s)
- Kimberly R Kukurba
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305; Department of Computer Science, Stanford University School of Medicine, Stanford, California 94305
| |
Collapse
|
37
|
Klemm F, Joyce JA. Microenvironmental regulation of therapeutic response in cancer. Trends Cell Biol 2014; 25:198-213. [PMID: 25540894 DOI: 10.1016/j.tcb.2014.11.006] [Citation(s) in RCA: 552] [Impact Index Per Article: 50.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Revised: 11/20/2014] [Accepted: 11/21/2014] [Indexed: 02/08/2023]
Abstract
The tumor microenvironment (TME) not only plays a pivotal role during cancer progression and metastasis but also has profound effects on therapeutic efficacy. In the case of microenvironment-mediated resistance this can involve an intrinsic response, including the co-option of pre-existing structural elements and signaling networks, or an acquired response of the tumor stroma following the therapeutic insult. Alternatively, in other contexts, the TME has a multifaceted ability to enhance therapeutic efficacy. This review examines recent advances in our understanding of the contribution of the TME during cancer therapy and discusses key concepts that may be amenable to therapeutic intervention.
Collapse
Affiliation(s)
- Florian Klemm
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Johanna A Joyce
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
38
|
Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, Treviño V, Shen H, Laird PW, Levine DA, Carter SL, Getz G, Stemke-Hale K, Mills GB, Verhaak RGW. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2014; 4:2612. [PMID: 24113773 PMCID: PMC3826632 DOI: 10.1038/ncomms3612] [Citation(s) in RCA: 6290] [Impact Index Per Article: 571.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 09/13/2013] [Indexed: 02/06/2023] Open
Abstract
Infiltrating stromal and immune cells form the major fraction of normal cells in tumour tissue and not only perturb the tumour signal in molecular studies but also have an important role in cancer biology. Here we describe ‘Estimation of STromal and Immune cells in MAlignant Tumours using Expression data’ (ESTIMATE)—a method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples. ESTIMATE scores correlate with DNA copy number-based tumour purity across samples from 11 different tumour types, profiled on Agilent, Affymetrix platforms or based on RNA sequencing and available through The Cancer Genome Atlas. The prediction accuracy is further corroborated using 3,809 transcriptional profiles available elsewhere in the public domain. The ESTIMATE method allows consideration of tumour-associated normal cells in genomic and transcriptomic studies. An R-library is available on https://sourceforge.net/projects/estimateproject/. Tumour biopsies contain contaminating normal cells and these can influence the analysis of tumour samples. In this study, Yoshihara et al. develop an algorithm based on gene expression profiles from The Cancer Genome Atlas to estimate the number of contaminating normal cells in tumour samples.
Collapse
Affiliation(s)
- Kosuke Yoshihara
- 1] Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Centre, Houston, Texas 77030, USA [2] Department of Obstetrics and Gynecology, Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Yadav VK, De S. An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples. Brief Bioinform 2014; 16:232-41. [PMID: 24562872 DOI: 10.1093/bib/bbu002] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Solid tumor samples typically contain multiple distinct clonal populations of cancer cells, and also stromal and immune cell contamination. A majority of the cancer genomics and transcriptomics studies do not explicitly consider genetic heterogeneity and impurity, and draw inferences based on mixed populations of cells. Deconvolution of genomic data from heterogeneous samples provides a powerful tool to address this limitation. We discuss several computational tools, which enable deconvolution of genomic and transcriptomic data from heterogeneous samples. We also performed a systematic comparative assessment of these tools. If properly used, these tools have potentials to complement single-cell genomics and immunoFISH analyses, and provide novel insights into tumor heterogeneity.
Collapse
|
40
|
Parameterizing cell-to-cell regulatory heterogeneities via stochastic transcriptional profiles. Proc Natl Acad Sci U S A 2014; 111:E626-35. [PMID: 24449900 DOI: 10.1073/pnas.1311647111] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Regulated changes in gene expression underlie many biological processes, but globally profiling cell-to-cell variations in transcriptional regulation is problematic when measuring single cells. Transcriptome-wide identification of regulatory heterogeneities can be robustly achieved by randomly collecting small numbers of cells followed by statistical analysis. However, this stochastic-profiling approach blurs out the expression states of the individual cells in each pooled sample. Here, we show that the underlying distribution of single-cell regulatory states can be deconvolved from stochastic-profiling data through maximum-likelihood inference. Guided by the mechanisms of transcriptional regulation, we formulated plausible mixture models for cell-to-cell regulatory heterogeneity and maximized the resulting likelihood functions to infer model parameters. Inferences were validated both computationally and experimentally for different mixture models, which included regulatory states for multicellular function that were occupied by as few as 1 in 40 cells of the population. Importantly, when the method was extended to programs of heterogeneously coexpressed transcripts, we found that population-level inferences were much more accurate with pooled samples than with one-cell samples when the extent of sampling was limited. Our deconvolution method provides a means to quantify the heterogeneous regulation of molecular states efficiently and gain a deeper understanding of the heterogeneous execution of cell decisions.
Collapse
|
41
|
Shen-Orr SS, Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol 2013; 25:571-8. [PMID: 24148234 DOI: 10.1016/j.coi.2013.09.015] [Citation(s) in RCA: 203] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 09/22/2013] [Accepted: 09/30/2013] [Indexed: 12/31/2022]
Abstract
The quanta unit of the immune system is the cell, yet analyzed samples are often heterogeneous with respect to cell subsets which can mislead result interpretation. Experimentally, researchers face a difficult choice whether to profile heterogeneous samples with the ensuing confounding effects, or a priori focus on a few cell subsets of interest, potentially limiting new discoveries. An attractive alternative solution is to extract cell subset-specific information directly from heterogeneous samples via computational deconvolution techniques, thereby capturing both cell-centered and whole system level context. Such approaches are capable of unraveling novel biology, undetectable otherwise. Here we review the present state of available deconvolution techniques, their advantages and limitations, with a focus on blood expression data and immunological studies in general.
Collapse
Affiliation(s)
- Shai S Shen-Orr
- Rappaport Institute of Medical Research, Technion-Israel Institute of Technology, Haifa 31096, Israel; Department of Immunology, Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel; Faculty of Biology, Technion-Israel Institute of Technology, Haifa 31096, Israel.
| | | |
Collapse
|
42
|
Liebner DA, Huang K, Parvin JD. MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples. ACTA ACUST UNITED AC 2013; 30:682-9. [PMID: 24085566 DOI: 10.1093/bioinformatics/btt566] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND One of the significant obstacles in the development of clinically relevant microarray-derived biomarkers and classifiers is tissue heterogeneity. Physical cell separation techniques, such as cell sorting and laser-capture microdissection, can enrich samples for cell types of interest, but are costly, labor intensive and can limit investigation of important interactions between different cell types. RESULTS We developed a new computational approach, called microarray microdissection with analysis of differences (MMAD), which performs microdissection in silico. Notably, MMAD (i) allows for simultaneous estimation of cell fractions and gene expression profiles of contributing cell types, (ii) adjusts for microarray normalization bias, (iii) uses the corrected Akaike information criterion during model optimization to minimize overfitting and (iv) provides mechanisms for comparing gene expression and cell fractions between samples in different classes. Computational microdissection of simulated and experimental tissue mixture datasets showed tight correlations between predicted and measured gene expression of pure tissues as well as tight correlations between reported and estimated cell fraction for each of the individual cell types. In simulation studies, MMAD showed superior ability to detect differentially expressed genes in mixed tissue samples when compared with standard metrics, including both significance analysis of microarrays and cell type-specific significance analysis of microarrays. CONCLUSIONS We have developed a new computational tool called MMAD, which is capable of performing robust tissue microdissection in silico, and which can improve the detection of differentially expressed genes. MMAD software as implemented in MATLAB is publically available for download at http://sourceforge.net/projects/mmad/.
Collapse
Affiliation(s)
- David A Liebner
- Division of Medical Oncology, Department of Internal Medicine, Department of Biomedical Informatics and Comprehensive Cancer Center, Biomedical Informatics Shared Resource, The Ohio State University, Columbus OH 43210, USA
| | | | | |
Collapse
|
43
|
Strino F, Parisi F, Micsinai M, Kluger Y. TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res 2013; 41:e165. [PMID: 23892400 PMCID: PMC3783191 DOI: 10.1093/nar/gkt641] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2013] [Revised: 06/11/2013] [Accepted: 07/02/2013] [Indexed: 01/01/2023] Open
Abstract
Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.
Collapse
Affiliation(s)
- Francesco Strino
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA
| | - Fabio Parisi
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA
| | - Mariann Micsinai
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA
| | - Yuval Kluger
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA
| |
Collapse
|
44
|
A self-directed method for cell-type identification and separation of gene expression microarrays. PLoS Comput Biol 2013; 9:e1003189. [PMID: 23990767 PMCID: PMC3749952 DOI: 10.1371/journal.pcbi.1003189] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 07/07/2013] [Indexed: 11/19/2022] Open
Abstract
Gene expression analysis is generally performed on heterogeneous tissue samples consisting of multiple cell types. Current methods developed to separate heterogeneous gene expression rely on prior knowledge of the cell-type composition and/or signatures - these are not available in most public datasets. We present a novel method to identify the cell-type composition, signatures and proportions per sample without need for a-priori information. The method was successfully tested on controlled and semi-controlled datasets and performed as accurately as current methods that do require additional information. As such, this method enables the analysis of cell-type specific gene expression using existing large pools of publically available microarray datasets. Gene expression microarrays are widely used to uncover biological insights. Most microarray experiments profile whole tissues containing mixtures of multiple cell-types. As such, gene expression differences between samples may be due to different cellular compositions or biological differences, highly limiting the conclusions derived from the analysis. All current approaches to computationally separate the heterogeneous gene expression to individual cell-types require that the identity, relative amount of the cell-types in the tissue or their individual gene expression are known. Publically available microarray-based datasets, which include thousands of patient samples, do not usually measure this information, rendering existing separation methods unusable. We developed a novel approach to estimate the number of cell-types, identities, individual gene expression and relative proportions in heterogeneous tissues with no a-priori information except for an initial estimate of the cell-types in the tissue analyzed and general reference signatures of these cell-types that may be easily obtained from public databases. We successfully applied our method to microarray datasets, yielding highly accurate estimations, which often exceed the performance of separation methods that require prior information. Thus, our method can be accurately applied to any heterogeneous dataset, where re-examination and analysis of the individual cell-types in the heterogeneous tissue can aid in discovering new aspects regarding these diseases.
Collapse
|
45
|
Burdick JT, Murray JI. Deconvolution of gene expression from cell populations across the C. elegans lineage. BMC Bioinformatics 2013; 14:204. [PMID: 23800200 PMCID: PMC3704917 DOI: 10.1186/1471-2105-14-204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 06/11/2013] [Indexed: 11/11/2022] Open
Abstract
Background Knowledge of when and in which cells each gene is expressed across multicellular organisms is critical in understanding both gene function and regulation of cell type diversity. However, methods for measuring expression typically involve a trade-off between imaging-based methods, which give the precise location of a limited number of genes, and higher throughput methods such as RNA-seq, which include all genes, but are more limited in their resolution to apply to many tissues. We propose an intermediate method, which estimates expression in individual cells, based on high-throughput measurements of expression from multiple overlapping groups of cells. This approach has particular benefits in organisms such as C. elegans where invariant developmental patterns make it possible to define these overlapping populations of cells at single-cell resolution. Result We implement several methods to deconvolve the gene expression in individual cells from population-level data and determine the accuracy of these estimates on simulated data from the C. elegans embryo. Conclusion These simulations suggest that a high-resolution map of expression in the C. elegans embryo may be possible with expression data from as few as 30 cell populations.
Collapse
Affiliation(s)
- Joshua T Burdick
- Genomics and Computational Biology Group, University of Pennsylvania, 440 Clinical Research Building, 415 Curie Boulevard, Philadelphia, PA 19104, USA
| | | |
Collapse
|
46
|
Ahn J, Yuan Y, Parmigiani G, Suraokar MB, Diao L, Wistuba II, Wang W. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. ACTA ACUST UNITED AC 2013; 29:1865-71. [PMID: 23712657 DOI: 10.1093/bioinformatics/btt301] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Tissue samples of tumor cells mixed with stromal cells cause underdetection of gene expression signatures associated with cancer prognosis or response to treatment. In silico dissection of mixed cell samples is essential for analyzing expression data generated in cancer studies. Currently, a systematic approach is lacking to address three challenges in computational deconvolution: (i) violation of linear addition of expression levels from multiple tissues when log-transformed microarray data are used; (ii) estimation of both tumor proportion and tumor-specific expression, when neither is known a priori; and (iii) estimation of expression profiles for individual patients. RESULTS We have developed a statistical method for deconvolving mixed cancer transcriptomes, DeMix, which addresses the aforementioned issues in array-based expression data. We demonstrate the performance of our model in synthetic and real, publicly available, datasets. DeMix can be applied to ongoing biomarker-based clinical studies and to the vast expression datasets previously generated from mixed tumor and stromal cell samples. AVAILABILITY All codes are written in C and integrated into an R function, which is available at http://odin.mdacc.tmc.edu/∼wwang7/DeMix.html. CONTACT wwang7@mdanderson.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jaeil Ahn
- Department of Bioinformatics and Computational Biology and Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
BACKGROUND RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact. RESULTS Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT. CONCLUSIONS The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.
Collapse
Affiliation(s)
- Yi Li
- Department of Computer Science, University of California, Irvine, CA, USA
| | - Xiaohui Xie
- Department of Computer Science, University of California, Irvine, CA, USA
- Institute for Genomics and Bioinformatics, University of California, Irvine, CA, USA
- Center for Machine Learning and Intelligent Systems, University of California, Irvine, CA, USA
| |
Collapse
|
48
|
Quon G, Haider S, Deshwar AG, Cui A, Boutros PC, Morris Q. Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Med 2013; 5:29. [PMID: 23537167 PMCID: PMC3706990 DOI: 10.1186/gm433] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 03/28/2013] [Indexed: 11/10/2022] Open
Abstract
Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer.
Collapse
|
49
|
Lehmusvaara S, Erkkilä T, Urbanucci A, Jalava S, Seppälä J, Kaipia A, Kujala P, Lähdesmäki H, Tammela TLJ, Visakorpi T. Goserelin and bicalutamide treatments alter the expression of microRNAs in the prostate. Prostate 2013; 73:101-12. [PMID: 22674191 DOI: 10.1002/pros.22545] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 05/14/2012] [Indexed: 12/19/2022]
Abstract
BACKGROUND Although endocrine therapy has been used for decades, its influence on the expression of microRNAs (miRNAs) in clinical tissue specimens has not been analyzed. Moreover, the effects of the TMPRSS2:ERG fusion on the expression of miRNAs in hormone naïve and endocrine-treated prostate cancers are poorly understood. METHODS We used clinical material from a neoadjuvant trial consisting of 28 men treated with goserelin (n = 8), bicalutamide (n = 9), or no treatment (n = 11) for 3 months prior to radical prostatectomy. Freshly frozen specimens were used for microarray analysis of 723 human miRNAs. Specific miRNA expression in cancer, benign epithelium and stromal tissue compartments was predicted with an in silico Bayesian modeling tool. RESULTS The expression of 52, 44, and 34 miRNAs was affected >1.4-fold by the endocrine treatment in the cancer, non-malignant epithelium, and stromal compartments, respectively. Of the 52 miRNAs, only 10 were equally affected by the two treatment modalities in the cancer compartment. Twenty-six of the 52 genes (50%) showed AR binding sites in their proximity in either VCaP or LNCaP cell lines. Forty-seven miRNAs were differentially expressed in TMPRSS2:ERG fusion positive compared with fusion negative cases. Endocrine treatment reduced the differences between fusion positive and negative cases. CONCLUSIONS Goserelin treatment and bicalutamide treatment mostly affected the expression of different miRNAs. The effect clearly varied in different tissue compartments. TMPRSS2:ERG fusion positive and negative cases showed differential expression of miRNAs, and the difference was diminished by androgen ablation.
Collapse
Affiliation(s)
- Saara Lehmusvaara
- Institute of Biomedical Technology and BioMediTech, University of Tampere and Tampere University Hospital, Tampere, Finland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Lehmusvaara S, Erkkilä T, Urbanucci A, Waltering K, Seppälä J, Larjo A, Tuominen VJ, Isola J, Kujala P, Lähdesmäki H, Kaipia A, Tammela TL, Visakorpi T. Chemical castration and anti-androgens induce differential gene expression in prostate cancer. J Pathol 2012; 227:336-45. [PMID: 22431170 DOI: 10.1002/path.4027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Revised: 02/04/2012] [Accepted: 03/09/2012] [Indexed: 11/08/2022]
Abstract
Endocrine therapy by castration or anti-androgens is the gold standard treatment for advanced prostate cancer. Although it has been used for decades, the molecular consequences of androgen deprivation are incompletely known and biomarkers of its resistance are lacking. In this study, we studied the molecular mechanisms of hormonal therapy by comparing the effect of bicalutamide (anti-androgen), goserelin (GnRH agonist) and no therapy, followed by radical prostatectomy. For this purpose, 28 men were randomly assigned to treatment groups. Freshly frozen specimens were used for gene expression profiling for all known protein-coding genes. An in silico Bayesian modelling tool was used to assess cancer-specific gene expression from heterogeneous tissue specimens. The expression of 128 genes was > two-fold reduced by the treatments. Only 16% of the altered genes were common in both treatment groups. Of the 128 genes, only 24 were directly androgen-regulated genes, according to re-analysis of previous data on gene expression, androgen receptor-binding sites and histone modifications in prostate cancer cell line models. The tumours containing TMPRSS2-ERG fusion showed higher gene expression of genes related to proliferation compared to the fusion-negative tumours in untreated cases. Interestingly, endocrine therapy reduced the expression of one-half of these genes and thus diminished the differences between the fusion-positive and -negative samples. This study reports the significantly different effects of an anti-androgen and a GnRH agonist on gene expression in prostate cancer cells. TMPRSS2-ERG fusion seems to bring many proliferation-related genes under androgen regulation.
Collapse
Affiliation(s)
- Saara Lehmusvaara
- Institute of Biomedical Technology and BioMediTech, University of Tampere and Tampere University Hospital, Finland
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|