Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Erkkilä T, Lehmusvaara S, Ruusuvuori P, Visakorpi T, Shmulevich I, Lähdesmäki H. Probabilistic analysis of gene expression measurements from heterogeneous tissues. ACTA ACUST UNITED AC 2010;26:2571-7. [PMID: 20631160 PMCID: PMC2951082 DOI: 10.1093/bioinformatics/btq406] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

For:	Erkkilä T, Lehmusvaara S, Ruusuvuori P, Visakorpi T, Shmulevich I, Lähdesmäki H. Probabilistic analysis of gene expression measurements from heterogeneous tissues. ACTA ACUST UNITED AC 2010;26:2571-7. [PMID: 20631160 PMCID: PMC2951082 DOI: 10.1093/bioinformatics/btq406] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Number

Cited by Other Article(s)

Li S, Zeng Y, He L, Xie X. Exploring Prognostic Immune Microenvironment-Related Genes in Head and Neck Squamous Cell Carcinoma from the TCGA Database. J Cancer 2024;15:632-644. [PMID: 38213736 PMCID: PMC10777048 DOI: 10.7150/jca.89581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/13/2023] [Indexed: 01/13/2024] Open

Abstract

Purpose: Head and neck squamous cell carcinoma (HNSCC) has a high rate of local and distant metastases. In tumor tissues, the interaction between tumor cells and the tumor microenvironment (TME) is closely related to cancer development and prognosis. Therefore, screening for TME-related genes in HNSCC is crucial for understanding metastatic patterns. Methods: Our research relied mainly on a novel algorithm called Estimation of STromal and Immune cells in MAlignant Tumors using Expression data (ESTIMATE). Fragments Per Kilobase of exon model per Million mapped fragments (FPKM) data and HNSCC clinical data were obtained from the TCGA database, and the purity of HNSCC tissue and the features of stromal and immune cell infiltration were determined. Furthermore, differentially expressed genes (DEGs) were screened based on immune, stromal, and ESTIMATE scores, and their protein-protein interaction (PPI) networks and ClueGO functions were evaluated. Finally, the expression profiles of DEGs related to immunity in HNSCC were determined. Differential gene expression was verified in the highly invasive oral cancer cell lines (SCC-25, CAL-27, and FaDu) and oral cancer tissues. Results: Our analysis found that both the immune and ESTIMATE scores were significantly associated with the prognosis of HNSCC. Moreover, cross-validation using the Venn algorithm revealed that 433 genes were significantly upregulated, and 394 genes were significantly downregulated. All DEGs were associated with both ESTIMATE and immune scores. The enrichment of cytokine-cytokine receptor interactions and chemokine signaling pathways was observed using pathway enrichment analyses. We initially screened 25 genes after analyzing the key sub-networks of the PPI network. Survival analysis revealed the significance of CCR4, CXCR3, P2RY14, CCR2, CCR8, and CCL19 in relation to survival and their association with immune infiltration-related metastasis in HNSCC. Conclusions: The expression profiles of relevant TME-related genes were screened following stromal and immune cell scoring using ESTIMATE, and DEGs associated with survival were identified. These TME-related gene markers offer valuable utility as both prognostic indicators and markers denoting metastatic traits in HNSCC.

Collapse

Tiwari A, Trivedi R, Lin SY. Tumor microenvironment: barrier or opportunity towards effective cancer therapy. J Biomed Sci 2022;29:83. [PMID: 36253762 PMCID: PMC9575280 DOI: 10.1186/s12929-022-00866-3] [Citation(s) in RCA: 183] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/01/2022] [Indexed: 12/24/2022] Open

Shi C, Zhu J, Shen Y, Luo S, Zhu H, Song R. Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2110876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]

Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021;23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open

Spade DA. A Monte Carlo integration approach to estimating drift and minorization coefficients for Metropolis–Hastings samplers. BRAZ J PROBAB STAT 2021. [DOI: 10.1214/20-bjps486] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Kang K, Huang C, Li Y, Umbach DM, Li L. CDSeqR: fast complete deconvolution for gene expression data from bulk tissues. BMC Bioinformatics 2021;22:262. [PMID: 34030626 PMCID: PMC8142515 DOI: 10.1186/s12859-021-04186-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 05/12/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community.

RESULT

We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project.

CONCLUSIONS

The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.

Collapse

Bayesian Joint Modeling of Single-Cell Expression Data and Bulk Spatial Transcriptomic Data. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09308-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Takeuchi F, Kato N. Nonlinear ridge regression improves cell-type-specific differential expression analysis. BMC Bioinformatics 2021;22:141. [PMID: 33752591 PMCID: PMC7986289 DOI: 10.1186/s12859-021-03982-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 01/27/2021] [Indexed: 12/28/2022] Open

Amrhein L, Fuchs C. stochprofML: stochastic profiling using maximum likelihood estimation in R. BMC Bioinformatics 2021;22:123. [PMID: 33722188 PMCID: PMC7958472 DOI: 10.1186/s12859-021-03970-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 01/15/2021] [Indexed: 11/10/2022] Open

Jaakkola MK, Elo LL. Computational deconvolution to estimate cell type-specific gene expression from bulk data. NAR Genom Bioinform 2021;3:lqaa110. [PMID: 33575652 PMCID: PMC7803005 DOI: 10.1093/nargab/lqaa110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/24/2022] Open

Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X. Deconvolution of heterogeneous tumor samples using partial reference signals. PLoS Comput Biol 2020;16:e1008452. [PMID: 33253170 PMCID: PMC7728196 DOI: 10.1371/journal.pcbi.1008452] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/10/2020] [Accepted: 10/19/2020] [Indexed: 12/16/2022] Open

Abstract

Deconvolution of heterogeneous bulk tumor samples into distinct cellular populations is an important yet challenging problem, particularly when only partial references are available. A common approach to dealing with this problem is to deconvolve the mixed signals using available references and leverage the remaining signal as a new cell component. However, as indicated in our simulation, such an approach tends to over-estimate the proportions of known cell types and fails to detect novel cell types. Here, we propose PREDE, a partial reference-based deconvolution method using an iterative non-negative matrix factorization algorithm. Our method is verified to be effective in estimating cell proportions and expression profiles of unknown cell types based on simulated datasets at a variety of parameter settings. Applying our method to TCGA tumor samples, we found that proportions of pure cancer cells better indicate different subtypes of tumor samples. We also detected several cell types for each cancer type whose proportions successfully predicted patient survival. Our method makes a significant contribution to deconvolution of heterogeneous tumor samples and could be widely applied to varieties of high throughput bulk data. PREDE is implemented in R and is freely available from GitHub (https://xiaoqizheng.github.io/PREDE).

Tumor tissues are mixtures of different cell types. Identification and quantification of constitutional cell types within tumor tissues are important tasks in cancer research. The problem can be readily solved using regression-based methods if reference signals are available. But in most clinical applications, only partial references are available, which significantly reduces the deconvolution accuracy of the existing regression-based methods. In this paper, we propose a partial-reference based deconvolution model, PREDE, integrating the non-negative matrix factorization framework with an iterative optimization strategy. We conducted comprehensive evaluations for PREDE using both simulation and real data analyses, demonstrating better performance of our method than other existing methods.

Collapse

Devaraj V, Bose B. DEBay: A computational tool for deconvolution of quantitative PCR data for estimation of cell type-specific gene expression in a mixed population. Heliyon 2020;6:e04489. [PMID: 32728643 PMCID: PMC7381708 DOI: 10.1016/j.heliyon.2020.e04489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 07/12/2020] [Accepted: 07/14/2020] [Indexed: 11/30/2022] Open

Li Z, Wu Z, Jin P, Wu H. Dissecting differential signals in high-throughput data from complex tissues. Bioinformatics 2020;35:3898-3905. [PMID: 30903684 DOI: 10.1093/bioinformatics/btz196] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 03/08/2019] [Accepted: 03/20/2019] [Indexed: 11/13/2022] Open

Li H, Sharma A, Luo K, Qin ZS, Sun X, Liu H. DeconPeaker, a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples. Front Genet 2020;11:392. [PMID: 32547592 PMCID: PMC7269180 DOI: 10.3389/fgene.2020.00392] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/30/2020] [Indexed: 12/26/2022] Open

Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019;15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open

Abstract

Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq.

Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.

Collapse

Danziger SA, Gibbs DL, Shmulevich I, McConnell M, Trotter MWB, Schmitz F, Reiss DJ, Ratushny AV. ADAPTS: Automated deconvolution augmentation of profiles for tissue specific cells. PLoS One 2019;14:e0224693. [PMID: 31743345 PMCID: PMC6863530 DOI: 10.1371/journal.pone.0224693] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 10/18/2019] [Indexed: 12/19/2022] Open

Petralia F, Wang L, Peng J, Yan A, Zhu J, Wang P. A new method for constructing tumor specific gene co-expression networks based on samples with tumor purity heterogeneity. Bioinformatics 2019;34:i528-i536. [PMID: 29949994 PMCID: PMC6022554 DOI: 10.1093/bioinformatics/bty280] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 2019;34:1969-1979. [PMID: 29351586 DOI: 10.1093/bioinformatics/bty019] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open

Boufaied N, Takhar M, Nash C, Erho N, Bismar TA, Davicioni E, Thomson AA. Development of a predictive model for stromal content in prostate cancer samples to improve signature performance. J Pathol 2019;249:411-424. [PMID: 31206668 PMCID: PMC6900085 DOI: 10.1002/path.5315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 05/27/2019] [Accepted: 06/13/2019] [Indexed: 01/23/2023]

Rombaut D, Chiu HS, Decaesteker B, Everaert C, Yigit N, Peltier A, Janoueix-Lerosey I, Bartenhagen C, Fischer M, Roberts S, D'Haene N, De Preter K, Speleman F, Denecker G, Sumazin P, Vandesompele J, Lefever S, Mestdagh P. Integrative analysis identifies lincRNAs up- and downstream of neuroblastoma driver genes. Sci Rep 2019;9:5685. [PMID: 30952905 PMCID: PMC6451017 DOI: 10.1038/s41598-019-42107-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 03/20/2019] [Indexed: 12/13/2022] Open

Affiliation(s)

Dries Rombaut Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Hua-Sheng Chiu Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
Bieke Decaesteker Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Celine Everaert Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Nurten Yigit Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Agathe Peltier Institut Curie, PSL Research University, Inserm U830, Equipe Labellisée contre le Cancer, F-75005, Paris, France.,SIREDO: Care, Innovation and Research for Children, Adolescents and Young Adults with Cancer, Institut Curie, F-75005, Paris, France
Isabelle Janoueix-Lerosey Institut Curie, PSL Research University, Inserm U830, Equipe Labellisée contre le Cancer, F-75005, Paris, France.,SIREDO: Care, Innovation and Research for Children, Adolescents and Young Adults with Cancer, Institut Curie, F-75005, Paris, France
Christoph Bartenhagen Department of Experimental Pediatric Oncology, University Children's Hospital of Cologne, Medical Faculty, University of Cologne, 50937, Cologne, Germany
Matthias Fischer Center for Molecular Medicine Cologne (CMMC), University of Cologne, 50931, Cologne, Germany.,Department of Experimental Pediatric Oncology, University Children's Hospital of Cologne, Medical Faculty, University of Cologne, 50937, Cologne, Germany
Stephen Roberts Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Nicky D'Haene Hôpital Erasme, Cliniques Universitaires de Bruxelles, Bruxelles, 1070, Belgium
Katleen De Preter Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Frank Speleman Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Geertrui Denecker Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Pavel Sumazin Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
Jo Vandesompele Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Steve Lefever Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium
Pieter Mestdagh Center for Medical Genetics, Ghent University, Ghent, 9000, Belgium. .,Cancer Research Institute Ghent (CRIG), Ghent, 9000, Belgium.

Collapse

Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018;19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes.

RESULTS

We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA).

CONCLUSION

Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).

Collapse

Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018;34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]

Affiliation(s)

Genevieve L Stein-O'Brien Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
Raman Arora Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
Aedin C Culhane Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
Alexander V Favorov Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
Lana X Garmire University of Hawaii Cancer Center, Honolulu, HI, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
Loyal A Goff Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
Yifeng Li Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
Aloune Ngom School of Computer Science, University of Windsor, Windsor, ON, Canada
Michael F Ochs Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
Yanxun Xu Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
Elana J Fertig Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.

Collapse

Dou H, Fang Y, Zheng X. Universal informative CpG sites for inferring tumor purity from DNA methylation microarray data. J Bioinform Comput Biol 2018;16:1750030. [PMID: 29347875 DOI: 10.1142/s0219720017500305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Gogolewski K, Wronowska W, Lech A, Lesyng B, Gambin A. Inferring Molecular Processes Heterogeneity from Transcriptional Data. BIOMED RESEARCH INTERNATIONAL 2017;2017:6961786. [PMID: 29362714 PMCID: PMC5736944 DOI: 10.1155/2017/6961786] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 09/23/2017] [Accepted: 10/08/2017] [Indexed: 12/01/2022]

Ogundijo OE, Wang X. A sequential Monte Carlo approach to gene expression deconvolution. PLoS One 2017;12:e0186167. [PMID: 29049343 PMCID: PMC5648148 DOI: 10.1371/journal.pone.0186167] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 09/26/2017] [Indexed: 01/06/2023] Open

Abstract

High-throughput gene expression data are often obtained from pure or complex (heterogeneous) biological samples. In the latter case, data obtained are a mixture of different cell types and the heterogeneity imposes some difficulties in the analysis of such data. In order to make conclusions on gene expresssion data obtained from heterogeneous samples, methods such as microdissection and flow cytometry have been employed to physically separate the constituting cell types. However, these manual approaches are time consuming when measuring the responses of multiple cell types simultaneously. In addition, exposed samples, on many occasions, end up being contaminated with external perturbations and this may result in an altered yield of molecular content. In this paper, we model the heterogeneous gene expression data using a Bayesian framework, treating the cell type proportions and the cell-type specific expressions as the parameters of the model. Specifically, we present a novel sequential Monte Carlo (SMC) sampler for estimating the model parameters by approximating their posterior distributions with a set of weighted samples. The SMC framework is a robust and efficient approach where we construct a sequence of artificial target (posterior) distributions on spaces of increasing dimensions which admit the distributions of interest as marginals. The proposed algorithm is evaluated on simulated datasets and publicly available real datasets, including Affymetrix oligonucleotide arrays and national center for biotechnology information (NCBI) gene expression omnibus (GEO), with varying number of cell types. The results obtained on all datasets show a superior performance with an improved accuracy in the estimation of cell type proportions and the cell-type specific expressions, and in addition, more accurate identification of differentially expressed genes when compared to other widely known methods for blind decomposition of heterogeneous gene expression data such as Dsection and the nonnegative matrix factorization (NMF) algorithms. MATLAB implementation of the proposed SMC algorithm is available to download at https://github.com/moyanre/smcgenedeconv.git.

Collapse

Zheng X, Zhang N, Wu HJ, Wu H. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol 2017;18:17. [PMID: 28122605 PMCID: PMC5267453 DOI: 10.1186/s13059-016-1143-5] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 12/20/2016] [Indexed: 01/03/2023] Open

Glass ER, Dozmorov MG. Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold. BMC Bioinformatics 2016;17:334. [PMID: 27766949 PMCID: PMC5073979 DOI: 10.1186/s12859-016-1226-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Abstract

Background

The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level.

Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis.

Results

We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics.

Conclusions

The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions of target cell (cell type being analyzed). We demonstrate that LRCDE, which uses Welch’s t-test to compare per-gene cell type-specific gene expression estimates, is more sensitive in detecting cell type-specific differential expression at α < 0.05 missed by the global false discovery rate threshold FDR < 0.3.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1226-z) contains supplementary material, which is available to authorized users.

Collapse

Houseman EA, Kile ML, Christiani DC, Ince TA, Kelsey KT, Marsit CJ. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformatics 2016;17:259. [PMID: 27358049 PMCID: PMC4928286 DOI: 10.1186/s12859-016-1140-4] [Citation(s) in RCA: 171] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 06/19/2016] [Indexed: 12/28/2022] Open

Reinartz S, Finkernagel F, Adhikary T, Rohnalter V, Schumann T, Schober Y, Nockher WA, Nist A, Stiewe T, Jansen JM, Wagner U, Müller-Brüsselbach S, Müller R. A transcriptome-based global map of signaling pathways in the ovarian cancer microenvironment associated with clinical outcome. Genome Biol 2016;17:108. [PMID: 27215396 PMCID: PMC4877997 DOI: 10.1186/s13059-016-0956-6] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 04/15/2016] [Indexed: 01/05/2023] Open

Abstract

BACKGROUND

Soluble protein and lipid mediators play essential roles in the tumor environment, but their cellular origins, targets, and clinical relevance are only partially known. We have addressed this question for the most abundant cell types in human ovarian carcinoma ascites, namely tumor cells and tumor-associated macrophages.

RESULTS

Transcriptome-derived datasets were adjusted for errors caused by contaminating cell types by an algorithm using expression data derived from pure cell types as references. These data were utilized to construct a network of autocrine and paracrine signaling pathways comprising 358 common and 58 patient-specific signaling mediators and their receptors. RNA sequencing based predictions were confirmed for several proteins and lipid mediators. Published expression microarray results for 1018 patients were used to establish clinical correlations for a number of components with distinct cellular origins and target cells. Clear associations with early relapse were found for STAT3-inducing cytokines, specific components of WNT and fibroblast growth factor signaling, ephrin and semaphorin axon guidance molecules, and TGFβ/BMP-triggered pathways. An association with early relapse was also observed for secretory macrophage-derived phospholipase PLA2G7, its product arachidonic acid (AA) and signaling pathways controlled by the AA metabolites PGE2, PGI2, and LTB4. By contrast, the genes encoding norrin and its receptor frizzled 4, both selectively expressed by cancer cells and previously not linked to tumor suppression, show a striking association with a favorable clinical course.

CONCLUSIONS

We have established a signaling network operating in the ovarian cancer microenvironment with previously unidentified pathways and have defined clinically relevant components within this network.

Collapse

Affiliation(s)

Silke Reinartz Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
Florian Finkernagel Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
Till Adhikary Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
Verena Rohnalter Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
Tim Schumann Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
Yvonne Schober Metabolomics Core Facility and Institute of Laboratory Medicine and Pathobiochemistry, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
W Andreas Nockher Metabolomics Core Facility and Institute of Laboratory Medicine and Pathobiochemistry, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
Andrea Nist Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
Thorsten Stiewe Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
Julia M Jansen Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
Uwe Wagner Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
Sabine Müller-Brüsselbach Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
Rolf Müller Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany.

Collapse

Wang F, Zhang N, Wang J, Wu H, Zheng X. Tumor purity and differential methylation in cancer epigenomics. Brief Funct Genomics 2016;15:408-419. [PMID: 27199459 DOI: 10.1093/bfgp/elw016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Gabitto MI, Pakman A, Bikoff JB, Abbott LF, Jessell TM, Paninski L. Bayesian Sparse Regression Analysis Documents the Diversity of Spinal Inhibitory Interneurons. Cell 2016;165:220-233. [PMID: 26949187 DOI: 10.1016/j.cell.2016.01.026] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 11/30/2015] [Accepted: 01/15/2016] [Indexed: 12/14/2022]

Rautio S, Lähdesmäki H. MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis. BMC Bioinformatics 2015;16:413. [PMID: 26703974 PMCID: PMC4690251 DOI: 10.1186/s12859-015-0834-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 11/24/2015] [Indexed: 08/30/2023] Open

Abstract

Background

Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression. To understand details of gene regulation, characterizing TF binding sites in different cell types, diseases and among individuals is essential. However, sometimes TF binding can only be measured from biological samples that contain multiple cell or tissue types. Sample heterogeneity can have a considerable effect on TF binding site detection. While manual separation techniques can be used to isolate a cell type of interest from heterogeneous samples, such techniques are challenging and can change intra-cellular interactions, including protein-DNA binding. Computational deconvolution methods have emerged as an alternative strategy to study heterogeneous samples and numerous methods have been proposed to analyze gene expression. However, no computational method exists to deconvolve cell type specific TF binding from heterogeneous samples.

Results

We present a probabilistic method, MixChIP, to identify cell type specific TF binding sites from heterogeneous chromatin immunoprecipitation sequencing (ChIP-seq) data. Our method simultaneously estimates the binding strength in different cell types as well as the proportions of different cell types in each sample when only partial prior information about cell type composition is available. We demonstrate the utility of MixChIP by analyzing ChIP-seq data from two cell lines which we artificially mix to generate (simulated) heterogeneous samples and by analyzing ChIP-seq data from breast cancer patients measuring oestrogen receptor (ER) binding in primary breast cancer tissues. We show that MixChIP is more accurate in detecting TF binding sites from multiple heterogeneous ChIP-seq samples than the standard methods which do not account for sample heterogeneity.

Conclusions

Our results show that MixChIP can estimate cell-type proportions and identify cell type specific TF binding sites from heterogeneous ChIP-seq samples. Thus, MixChIP can be an invaluable tool in analyzing heterogeneous ChIP-seq samples, such as those originating from cancer studies. R implementation is available at http://research.ics.aalto.fi/csb/software/mixchip/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0834-3) contains supplementary material, which is available to authorized users.

Collapse

The influence of cancer tissue sampling on the identification of cancer characteristics. Sci Rep 2015;5:15474. [PMID: 26490514 PMCID: PMC4614546 DOI: 10.1038/srep15474] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 09/24/2015] [Indexed: 12/21/2022] Open

Dozmorov MG, Dominguez N, Bean K, Macwana SR, Roberts V, Glass E, James JA, Guthridge JM. B-Cell and Monocyte Contribution to Systemic Lupus Erythematosus Identified by Cell-Type-Specific Differential Expression Analysis in RNA-Seq Data. Bioinform Biol Insights 2015;9:11-9. [PMID: 26512198 PMCID: PMC4599594 DOI: 10.4137/bbi.s29470] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Revised: 08/24/2015] [Accepted: 08/26/2015] [Indexed: 12/18/2022] Open

Anghel CV, Quon G, Haider S, Nguyen F, Deshwar AG, Morris QD, Boutros PC. ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics 2015;16:156. [PMID: 25972088 PMCID: PMC4429941 DOI: 10.1186/s12859-015-0597-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/27/2015] [Indexed: 01/23/2023] Open

Kukurba KR, Montgomery SB. RNA Sequencing and Analysis. Cold Spring Harb Protoc 2015;2015:951-69. [PMID: 25870306 DOI: 10.1101/pdb.top084970] [Citation(s) in RCA: 461] [Impact Index Per Article: 46.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Klemm F, Joyce JA. Microenvironmental regulation of therapeutic response in cancer. Trends Cell Biol 2014;25:198-213. [PMID: 25540894 DOI: 10.1016/j.tcb.2014.11.006] [Citation(s) in RCA: 552] [Impact Index Per Article: 50.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Revised: 11/20/2014] [Accepted: 11/21/2014] [Indexed: 02/08/2023]

Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, Treviño V, Shen H, Laird PW, Levine DA, Carter SL, Getz G, Stemke-Hale K, Mills GB, Verhaak RGW. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2014;4:2612. [PMID: 24113773 PMCID: PMC3826632 DOI: 10.1038/ncomms3612] [Citation(s) in RCA: 6290] [Impact Index Per Article: 571.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 09/13/2013] [Indexed: 02/06/2023] Open

Yadav VK, De S. An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples. Brief Bioinform 2014;16:232-41. [PMID: 24562872 DOI: 10.1093/bib/bbu002] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Parameterizing cell-to-cell regulatory heterogeneities via stochastic transcriptional profiles. Proc Natl Acad Sci U S A 2014;111:E626-35. [PMID: 24449900 DOI: 10.1073/pnas.1311647111] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Shen-Orr SS, Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol 2013;25:571-8. [PMID: 24148234 DOI: 10.1016/j.coi.2013.09.015] [Citation(s) in RCA: 203] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 09/22/2013] [Accepted: 09/30/2013] [Indexed: 12/31/2022]

Liebner DA, Huang K, Parvin JD. MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples. ACTA ACUST UNITED AC 2013;30:682-9. [PMID: 24085566 DOI: 10.1093/bioinformatics/btt566] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

BACKGROUND

One of the significant obstacles in the development of clinically relevant microarray-derived biomarkers and classifiers is tissue heterogeneity. Physical cell separation techniques, such as cell sorting and laser-capture microdissection, can enrich samples for cell types of interest, but are costly, labor intensive and can limit investigation of important interactions between different cell types.

RESULTS

We developed a new computational approach, called microarray microdissection with analysis of differences (MMAD), which performs microdissection in silico. Notably, MMAD (i) allows for simultaneous estimation of cell fractions and gene expression profiles of contributing cell types, (ii) adjusts for microarray normalization bias, (iii) uses the corrected Akaike information criterion during model optimization to minimize overfitting and (iv) provides mechanisms for comparing gene expression and cell fractions between samples in different classes. Computational microdissection of simulated and experimental tissue mixture datasets showed tight correlations between predicted and measured gene expression of pure tissues as well as tight correlations between reported and estimated cell fraction for each of the individual cell types. In simulation studies, MMAD showed superior ability to detect differentially expressed genes in mixed tissue samples when compared with standard metrics, including both significance analysis of microarrays and cell type-specific significance analysis of microarrays.

CONCLUSIONS

We have developed a new computational tool called MMAD, which is capable of performing robust tissue microdissection in silico, and which can improve the detection of differentially expressed genes. MMAD software as implemented in MATLAB is publically available for download at http://sourceforge.net/projects/mmad/.

Collapse

Strino F, Parisi F, Micsinai M, Kluger Y. TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res 2013;41:e165. [PMID: 23892400 PMCID: PMC3783191 DOI: 10.1093/nar/gkt641] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2013] [Revised: 06/11/2013] [Accepted: 07/02/2013] [Indexed: 01/01/2023] Open

A self-directed method for cell-type identification and separation of gene expression microarrays. PLoS Comput Biol 2013;9:e1003189. [PMID: 23990767 PMCID: PMC3749952 DOI: 10.1371/journal.pcbi.1003189] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 07/07/2013] [Indexed: 11/19/2022] Open

Abstract

Gene expression analysis is generally performed on heterogeneous tissue samples consisting of multiple cell types. Current methods developed to separate heterogeneous gene expression rely on prior knowledge of the cell-type composition and/or signatures - these are not available in most public datasets. We present a novel method to identify the cell-type composition, signatures and proportions per sample without need for a-priori information. The method was successfully tested on controlled and semi-controlled datasets and performed as accurately as current methods that do require additional information. As such, this method enables the analysis of cell-type specific gene expression using existing large pools of publically available microarray datasets.

Gene expression microarrays are widely used to uncover biological insights. Most microarray experiments profile whole tissues containing mixtures of multiple cell-types. As such, gene expression differences between samples may be due to different cellular compositions or biological differences, highly limiting the conclusions derived from the analysis. All current approaches to computationally separate the heterogeneous gene expression to individual cell-types require that the identity, relative amount of the cell-types in the tissue or their individual gene expression are known. Publically available microarray-based datasets, which include thousands of patient samples, do not usually measure this information, rendering existing separation methods unusable. We developed a novel approach to estimate the number of cell-types, identities, individual gene expression and relative proportions in heterogeneous tissues with no a-priori information except for an initial estimate of the cell-types in the tissue analyzed and general reference signatures of these cell-types that may be easily obtained from public databases. We successfully applied our method to microarray datasets, yielding highly accurate estimations, which often exceed the performance of separation methods that require prior information. Thus, our method can be accurately applied to any heterogeneous dataset, where re-examination and analysis of the individual cell-types in the heterogeneous tissue can aid in discovering new aspects regarding these diseases.

Collapse

Burdick JT, Murray JI. Deconvolution of gene expression from cell populations across the C. elegans lineage. BMC Bioinformatics 2013;14:204. [PMID: 23800200 PMCID: PMC3704917 DOI: 10.1186/1471-2105-14-204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 06/11/2013] [Indexed: 11/11/2022] Open

Ahn J, Yuan Y, Parmigiani G, Suraokar MB, Diao L, Wistuba II, Wang W. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. ACTA ACUST UNITED AC 2013;29:1865-71. [PMID: 23712657 DOI: 10.1093/bioinformatics/btt301] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Li Y, Xie X. A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues. BMC Bioinformatics 2013;14 Suppl 5:S11. [PMID: 23735186 PMCID: PMC3622628 DOI: 10.1186/1471-2105-14-s5-s11] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact.

RESULTS

Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT.

CONCLUSIONS

The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.

Collapse

Quon G, Haider S, Deshwar AG, Cui A, Boutros PC, Morris Q. Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Med 2013;5:29. [PMID: 23537167 PMCID: PMC3706990 DOI: 10.1186/gm433] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 03/28/2013] [Indexed: 11/10/2022] Open

Lehmusvaara S, Erkkilä T, Urbanucci A, Jalava S, Seppälä J, Kaipia A, Kujala P, Lähdesmäki H, Tammela TLJ, Visakorpi T. Goserelin and bicalutamide treatments alter the expression of microRNAs in the prostate. Prostate 2013;73:101-12. [PMID: 22674191 DOI: 10.1002/pros.22545] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 05/14/2012] [Indexed: 12/19/2022]

Lehmusvaara S, Erkkilä T, Urbanucci A, Waltering K, Seppälä J, Larjo A, Tuominen VJ, Isola J, Kujala P, Lähdesmäki H, Kaipia A, Tammela TL, Visakorpi T. Chemical castration and anti-androgens induce differential gene expression in prostate cancer. J Pathol 2012;227:336-45. [PMID: 22431170 DOI: 10.1002/path.4027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Revised: 02/04/2012] [Accepted: 03/09/2012] [Indexed: 11/08/2022]