1
|
Guo X, Huang Z, Ju F, Zhao C, Yu L. Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2306329. [PMID: 38072669 PMCID: PMC10870031 DOI: 10.1002/advs.202306329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/27/2023] [Indexed: 02/17/2024]
Abstract
Accurately identifies the cellular composition of complex tissues, which is critical for understanding disease pathogenesis, early diagnosis, and prevention. However, current methods for deconvoluting bulk RNA sequencing (RNA-seq) typically rely on matched single-cell RNA sequencing (scRNA-seq) as a reference, which can be limiting due to differences in sequencing distribution and the potential for invalid information from single-cell references. Hence, a novel computational method named SCROAM is introduced to address these challenges. SCROAM transforms scRNA-seq and bulk RNA-seq into a shared feature space, effectively eliminating distributional differences in the latent space. Subsequently, cell-type-specific expression matrices are generated from the scRNA-seq data, facilitating the precise identification of cell types within bulk tissues. The performance of SCROAM is assessed through benchmarking against simulated and real datasets, demonstrating its accuracy and robustness. To further validate SCROAM's performance, single-cell and bulk RNA-seq experiments are conducted on mouse spinal cord tissue, with SCROAM applied to identify cell types in bulk tissue. Results indicate that SCROAM is a highly effective tool for identifying similar cell types. An integrated analysis of liver cancer and primary glioblastoma is then performed. Overall, this research offers a novel perspective for delivering precise insights into disease pathogenesis and potential therapeutic strategies.
Collapse
Affiliation(s)
- Xinyang Guo
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Zhaoyang Huang
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Fen Ju
- Department of Rehabilitation MedicineXijing HospitalFourth Military Medical UniversityXi'an710032China
| | - Chenguang Zhao
- Department of Rehabilitation MedicineXijing HospitalFourth Military Medical UniversityXi'an710032China
| | - Liang Yu
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| |
Collapse
|
2
|
Heiling HM, Wilson DR, Rashid NU, Sun W, Ibrahim JG. Estimating cell type composition using isoform expression one gene at a time. Biometrics 2023; 79:854-865. [PMID: 34921386 PMCID: PMC11245124 DOI: 10.1111/biom.13614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 12/08/2021] [Indexed: 11/29/2022]
Abstract
Human tissue samples are often mixtures of heterogeneous cell types, which can confound the analyses of gene expression data derived from such tissues. The cell type composition of a tissue sample may itself be of interest and is needed for proper analysis of differential gene expression. A variety of computational methods have been developed to estimate cell type proportions using gene-level expression data. However, RNA isoforms can also be differentially expressed across cell types, and isoform-level expression could be equally or more informative for determining cell type origin than gene-level expression. We propose a new computational method, IsoDeconvMM, which estimates cell type fractions using isoform-level gene expression data. A novel and useful feature of IsoDeconvMM is that it can estimate cell type proportions using only a single gene, though in practice we recommend aggregating estimates of a few dozen genes to obtain more accurate results. We demonstrate the performance of IsoDeconvMM using a unique data set with cell type-specific RNA-seq data across more than 135 individuals. This data set allows us to evaluate different methods given the biological variation of cell type-specific gene expression data across individuals. We further complement this analysis with additional simulations.
Collapse
Affiliation(s)
- Hillary M Heiling
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Douglas R Wilson
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Naim U Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Wei Sun
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
3
|
Otto R, Detjen KM, Riemer P, Fattohi M, Grötzinger C, Rindi G, Wiedenmann B, Sers C, Leser U. Transcriptomic Deconvolution of Neuroendocrine Neoplasms Predicts Clinically Relevant Characteristics. Cancers (Basel) 2023; 15:cancers15030936. [PMID: 36765893 PMCID: PMC9913692 DOI: 10.3390/cancers15030936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/20/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open
Abstract
Pancreatic neuroendocrine neoplasms (panNENs) are a rare yet diverse type of neoplasia whose precise clinical-pathological classification is frequently challenging. Since incorrect classifications can affect treatment decisions, additional tools which support the diagnosis, such as machine learning (ML) techniques, are critically needed but generally unavailable due to the scarcity of suitable ML training data for rare panNENs. Here, we demonstrate that a multi-step ML framework predicts clinically relevant panNEN characteristics while being exclusively trained on widely available data of a healthy origin. The approach classifies panNENs by deconvolving their transcriptomes into cell type proportions based on shared gene expression profiles with healthy pancreatic cell types. The deconvolution results were found to provide a prognostic value with respect to the prediction of the overall patient survival time, neoplastic grading, and carcinoma versus tumor subclassification. The performance with which a proliferation rate agnostic deconvolution ML model could predict the clinical characteristics was found to be comparable to that of a comparative baseline model trained on the proliferation rate-informed MKI67 levels. The approach is novel in that it complements established proliferation rate-oriented classification schemes whose results can be reproduced and further refined by differentiating between identically graded subgroups. By including non-endocrine cell types, the deconvolution approach furthermore provides an in silico quantification of panNEN dedifferentiation, optimizing it for challenging clinical classification tasks in more aggressive panNEN subtypes.
Collapse
Affiliation(s)
- Raik Otto
- Knowledge Management in Bioinformatics, Institute for Computer Science, Humboldt-Universität zu Berlin, 10099 Berlin, Germany
- Correspondence: ; Tel.: +49-030-2093-3086
| | - Katharina M. Detjen
- Department of Hepatology and Gastroenterology, Charité—Universitätsmedizin Berlin, Campus Virchow-Klinikum and Campus Charité Mitte, 13353 Berlin, Germany
| | - Pamela Riemer
- Laboratory of Molecular Tumor Pathology and Systems Biology, Institute of Pathology, Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Melanie Fattohi
- Knowledge Management in Bioinformatics, Institute for Computer Science, Humboldt-Universität zu Berlin, 10099 Berlin, Germany
| | - Carsten Grötzinger
- Department of Hepatology and Gastroenterology, Charité—Universitätsmedizin Berlin, Campus Virchow-Klinikum and Campus Charité Mitte, 13353 Berlin, Germany
| | - Guido Rindi
- Section of Anatomic Pathology, Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168 Roma, Italy
- Anatomic Pathology Unit, Department of Woman and Child Health and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Roma, Italy
| | - Bertram Wiedenmann
- Department of Hepatology and Gastroenterology, Charité—Universitätsmedizin Berlin, Campus Virchow-Klinikum and Campus Charité Mitte, 13353 Berlin, Germany
| | - Christine Sers
- Laboratory of Molecular Tumor Pathology and Systems Biology, Institute of Pathology, Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Institute for Computer Science, Humboldt-Universität zu Berlin, 10099 Berlin, Germany
| |
Collapse
|
4
|
Tiwari A, Trivedi R, Lin SY. Tumor microenvironment: barrier or opportunity towards effective cancer therapy. J Biomed Sci 2022; 29:83. [PMID: 36253762 PMCID: PMC9575280 DOI: 10.1186/s12929-022-00866-3] [Citation(s) in RCA: 183] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/01/2022] [Indexed: 12/24/2022] Open
Abstract
Tumor microenvironment (TME) is a specialized ecosystem of host components, designed by tumor cells for successful development and metastasis of tumor. With the advent of 3D culture and advanced bioinformatic methodologies, it is now possible to study TME’s individual components and their interplay at higher resolution. Deeper understanding of the immune cell’s diversity, stromal constituents, repertoire profiling, neoantigen prediction of TMEs has provided the opportunity to explore the spatial and temporal regulation of immune therapeutic interventions. The variation of TME composition among patients plays an important role in determining responders and non-responders towards cancer immunotherapy. Therefore, there could be a possibility of reprogramming of TME components to overcome the widely prevailing issue of immunotherapeutic resistance. The focus of the present review is to understand the complexity of TME and comprehending future perspective of its components as potential therapeutic targets. The later part of the review describes the sophisticated 3D models emerging as valuable means to study TME components and an extensive account of advanced bioinformatic tools to profile TME components and predict neoantigens. Overall, this review provides a comprehensive account of the current knowledge available to target TME.
Collapse
Affiliation(s)
- Aadhya Tiwari
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Rakesh Trivedi
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shiaw-Yih Lin
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
5
|
MRI Radiogenomics in Precision Oncology: New Diagnosis and Treatment Method. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2703350. [PMID: 35845886 PMCID: PMC9282990 DOI: 10.1155/2022/2703350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/04/2022] [Accepted: 05/25/2022] [Indexed: 11/21/2022]
Abstract
Precision medicine for cancer affords a new way for the most accurate and effective treatment to each individual cancer. Given the high time-evolving intertumor and intratumor heterogeneity features of personal medicine, there are still several obstacles hindering its diagnosis and treatment in clinical practice regardless of extensive exploration on it over the past years. This paper is to investigate radiogenomics methods in the literature for precision medicine for cancer focusing on the heterogeneity analysis of tumors. Based on integrative analysis of multimodal (parametric) imaging and molecular data in bulk tumors, a comprehensive analysis and discussion involving the characterization of tumor heterogeneity in imaging and molecular expression are conducted. These investigations are intended to (i) fully excavate the multidimensional spatial, temporal, and semantic related information regarding high-dimensional breast magnetic resonance imaging data, with integration of the highly specific structured data of genomics and combination of the diagnosis and cognitive process of doctors, and (ii) establish a radiogenomics data representation model based on multidimensional consistency analysis with multilevel spatial-temporal correlations.
Collapse
|
6
|
Comprehensive evaluation of deconvolution methods for human brain gene expression. Nat Commun 2022; 13:1358. [PMID: 35292647 PMCID: PMC8924248 DOI: 10.1038/s41467-022-28655-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 01/28/2022] [Indexed: 11/08/2022] Open
Abstract
Transcriptome deconvolution aims to estimate the cellular composition of an RNA sample from its gene expression data, which in turn can be used to correct for composition differences across samples. The human brain is unique in its transcriptomic diversity, and comprises a complex mixture of cell-types, including transcriptionally similar subtypes of neurons. Here, we carry out a comprehensive evaluation of deconvolution methods for human brain transcriptome data, and assess the tissue-specificity of our key observations by comparison with human pancreas and heart. We evaluate eight transcriptome deconvolution approaches and nine cell-type signatures, testing the accuracy of deconvolution using in silico mixtures of single-cell RNA-seq data, RNA mixtures, as well as nearly 2000 human brain samples. Our results identify the main factors that drive deconvolution accuracy for brain data, and highlight the importance of biological factors influencing cell-type signatures, such as brain region and in vitro cell culturing. Transcriptome deconvolution aims to estimate cellular composition based on gene expression data. Here the authors evaluate deconvolution methods for human brain transcriptome and conclude that partial deconvolution algorithms work best, but that appropriate cell-type signatures are also important.
Collapse
|
7
|
Chen L, Wu CT, Lin CH, Dai R, Liu C, Clarke R, Yu G, Van Eyk JE, Herrington DM, Wang Y. swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution. Bioinformatics 2022; 38:1403-1410. [PMID: 34904628 PMCID: PMC8826012 DOI: 10.1093/bioinformatics/btab839] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 10/30/2021] [Accepted: 12/10/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Complex biological tissues are often a heterogeneous mixture of several molecularly distinct cell subtypes. Both subtype compositions and subtype-specific (STS) expressions can vary across biological conditions. Computational deconvolution aims to dissect patterns of bulk tissue data into subtype compositions and STS expressions. Existing deconvolution methods can only estimate averaged STS expressions in a population, while many downstream analyses such as inferring co-expression networks in particular subtypes require subtype expression estimates in individual samples. However, individual-level deconvolution is a mathematically underdetermined problem because there are more variables than observations. RESULTS We report a sample-wise Convex Analysis of Mixtures (swCAM) method that can estimate subtype proportions and STS expressions in individual samples from bulk tissue transcriptomes. We extend our previous CAM framework to include a new term accounting for between-sample variations and formulate swCAM as a nuclear-norm and ℓ2,1-norm regularized matrix factorization problem. We determine hyperparameter values using cross-validation with random entry exclusion and obtain a swCAM solution using an efficient alternating direction method of multipliers. Experimental results on realistic simulation data show that swCAM can accurately estimate STS expressions in individual samples and successfully extract co-expression networks in particular subtypes that are otherwise unobtainable using bulk data. In two real-world applications, swCAM analysis of bulk RNASeq data from brain tissue of cases and controls with bipolar disorder or Alzheimer's disease identified significant changes in cell proportion, expression pattern and co-expression module in patient neurons. Comparative evaluation of swCAM versus peer methods is also provided. AVAILABILITY AND IMPLEMENTATION The R Scripts of swCAM are freely available at https://github.com/Lululuella/swCAM. A user's guide and a vignette are provided. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chia-Hsiang Lin
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
8
|
Saddic L, Orosco A, Guo D, Milewicz DM, Troxlair D, Heide RV, Herrington D, Wang Y, Azizzadeh A, Parker SJ. Proteomic analysis of descending thoracic aorta identifies unique and universal signatures of aneurysm and dissection. JVS Vasc Sci 2022; 3:85-181. [PMID: 35280433 PMCID: PMC8914561 DOI: 10.1016/j.jvssci.2022.01.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 01/05/2022] [Indexed: 01/05/2023] Open
Abstract
Objective Methods Results Conclusions Diseases of the descending thoracic aorta such as aneurysms and dissections carry a high degree of morbidity and mortality. At present, a complete understanding is still lacking of the genetics that drive these diseases and why some aortic segments dissect in the presence or absence of an aneurysm. We compared and contrasted the whole proteome expression of descending aortas from patients with normal, dissected, aneurysmal, and aneurysmal with dissected pathology aortic tissue. We uncovered potential tissue markers that might serve as future targets for therapy or predictors of disease progression.
Collapse
Affiliation(s)
- Louis Saddic
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, Calif
| | - Amanda Orosco
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, Calif
| | - Dongchuan Guo
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Tex
| | - Dianna M. Milewicz
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Tex
| | - Dana Troxlair
- Department of Pathology, Louisiana State University, New Orleans, La
| | | | - David Herrington
- Department of Cardiovascular Medicine, Wake Forest University, Winston-Salem, NC
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Va
| | - Ali Azizzadeh
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, Calif
| | - Sarah J. Parker
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, Calif
- Correspondence: Sarah J. Parker, PhD, Department of Cardiology, Smidt Heart Institute, Cedars Sinai Medical Center, AHSP A9228, 8700 Beverly Blvd, Los Angeles, CA 90048
| |
Collapse
|
9
|
Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021; 23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Affiliation(s)
- Maria K Jaakkola
- Department of Mathematics and Statistics, University of Turku, Yliopistonmäki, 20014, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520, Turku, Finland
| |
Collapse
|
10
|
Peng J, Han L, Shang X. A novel method for predicting cell abundance based on single-cell RNA-seq data. BMC Bioinformatics 2021; 22:281. [PMID: 34433409 PMCID: PMC8386079 DOI: 10.1186/s12859-021-04187-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/12/2021] [Indexed: 01/01/2023] Open
Abstract
Background It is important to understand the composition of cell type and its proportion in intact tissues, as changes in certain cell types are the underlying cause of disease in humans. Although compositions of cell type and ratios can be obtained by single-cell sequencing, single-cell sequencing is currently expensive and cannot be applied in clinical studies involving a large number of subjects. Therefore, it is useful to apply the bulk RNA-Seq dataset and the single-cell RNA dataset to deconvolute and obtain the cell type composition in the tissue. Results By analyzing the existing cell population prediction methods, we found that most of the existing methods need the cell-type-specific gene expression profile as the input of the signature matrix. However, in real applications, it is not always possible to find an available signature matrix. To solve this problem, we proposed a novel method, named DCap, to predict cell abundance. DCap is a deconvolution method based on non-negative least squares. DCap considers the weight resulting from measurement noise of bulk RNA-seq and calculation error of single-cell RNA-seq data, during the calculation process of non-negative least squares and performs the weighted iterative calculation based on least squares. By weighting the bulk tissue gene expression matrix and single-cell gene expression matrix, DCap minimizes the measurement error of bulk RNA-Seq and also reduces errors resulting from differences in the number of expressed genes in the same type of cells in different samples. Evaluation test shows that DCap performs better in cell type abundance prediction than existing methods. Conclusion DCap solves the deconvolution problem using weighted non-negative least squares to predict cell type abundance in tissues. DCap has better prediction results and does not need to prepare a signature matrix that gives the cell-type-specific gene expression profile in advance. By using DCap, we can better study the changes in cell proportion in diseased tissues and provide more information on the follow-up treatment of diseases.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Chang'an Ave, Changan Qu, Xi'an City, Shaanxi Province, China
| | - Lu Han
- School of Computer Science, Northwestern Polytechnical University, Chang'an Ave, Changan Qu, Xi'an City, Shaanxi Province, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Chang'an Ave, Changan Qu, Xi'an City, Shaanxi Province, China.
| |
Collapse
|
11
|
Dong M, Thennavan A, Urrutia E, Li Y, Perou CM, Zou F, Jiang Y. SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references. Brief Bioinform 2021; 22:416-427. [PMID: 31925417 PMCID: PMC7820884 DOI: 10.1093/bib/bbz166] [Citation(s) in RCA: 147] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/04/2019] [Accepted: 12/02/2019] [Indexed: 12/14/2022] Open
Abstract
Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.
Collapse
Affiliation(s)
| | | | | | | | | | - Fei Zou
- Corresponding authors: Fei Zou and Yuchao Jiang, Department of Biostatistics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. ,
| | - Yuchao Jiang
- Corresponding authors: Fei Zou and Yuchao Jiang, Department of Biostatistics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. ,
| |
Collapse
|
12
|
Jaakkola MK, Elo LL. Computational deconvolution to estimate cell type-specific gene expression from bulk data. NAR Genom Bioinform 2021; 3:lqaa110. [PMID: 33575652 PMCID: PMC7803005 DOI: 10.1093/nargab/lqaa110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/24/2022] Open
Abstract
Computational deconvolution is a time and cost-efficient approach to obtain cell type-specific information from bulk gene expression of heterogeneous tissues like blood. Deconvolution can aim to either estimate cell type proportions or abundances in samples, or estimate how strongly each present cell type expresses different genes, or both tasks simultaneously. Among the two separate goals, the estimation of cell type proportions/abundances is widely studied, but less attention has been paid on defining the cell type-specific expression profiles. Here, we address this gap by introducing a novel method Rodeo and empirically evaluating it and the other available tools from multiple perspectives utilizing diverse datasets.
Collapse
Affiliation(s)
- Maria K Jaakkola
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| |
Collapse
|
13
|
Radiogenomic signatures reveal multiscale intratumour heterogeneity associated with biological functions and survival in breast cancer. Nat Commun 2020; 11:4861. [PMID: 32978398 PMCID: PMC7519071 DOI: 10.1038/s41467-020-18703-2] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/08/2020] [Indexed: 12/24/2022] Open
Abstract
Advanced tumours are often heterogeneous, consisting of subclones with various genetic alterations and functional roles. The precise molecular features that characterize the contributions of multiscale intratumour heterogeneity to malignant progression, metastasis, and poor survival are largely unknown. Here, we address these challenges in breast cancer by defining the landscape of heterogeneous tumour subclones and their biological functions using radiogenomic signatures. Molecular heterogeneity is identified by a fully unsupervised deconvolution of gene expression data. Relative prevalence of two subclones associated with cell cycle and primary immunodeficiency pathways identifies patients with significantly different survival outcomes. Radiogenomic signatures of imaging scale heterogeneity are extracted and used to classify patients into groups with distinct subclone compositions. Prognostic value is confirmed by survival analysis accounting for clinical variables. These findings provide insight into how a radiogenomic analysis can identify the biological activities of specific subclones that predict prognosis in a noninvasive and clinically relevant manner. Tumours are made up of heterogeneous subclones. Here, the authors show using breast cancer imaging and gene expression datasets that these subclones can be inferred by the deconvolution of gene expression data, mapped to MRI derived radiogenomic signatures and used to estimate prognosis.
Collapse
|
14
|
Clarke R, Kraikivski P, Jones BC, Sevigny CM, Sengupta S, Wang Y. A systems biology approach to discovering pathway signaling dysregulation in metastasis. Cancer Metastasis Rev 2020; 39:903-918. [PMID: 32776157 PMCID: PMC7487029 DOI: 10.1007/s10555-020-09921-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 07/13/2020] [Indexed: 02/07/2023]
Abstract
Total metastatic burden is the primary cause of death for many cancer patients. While the process of metastasis has been studied widely, much remains to be understood. Moreover, few agents have been developed that specifically target the major steps of the metastatic cascade. Many individual genes and pathways have been implicated in metastasis but a holistic view of how these interact and cooperate to regulate and execute the process remains somewhat rudimentary. It is unclear whether all of the signaling features that regulate and execute metastasis are yet fully understood. Novel features of a complex system such as metastasis can often be discovered by taking a systems-based approach. We introduce the concepts of systems modeling and define some of the central challenges facing the application of a multidisciplinary systems-based approach to understanding metastasis and finding actionable targets therein. These challenges include appreciating the unique properties of the high-dimensional omics data often used for modeling, limitations in knowledge of the system (metastasis), tumor heterogeneity and sampling bias, and some of the issues key to understanding critical features of molecular signaling in the context of metastasis. We also provide a brief introduction to integrative modeling that focuses on both the nodes and edges of molecular signaling networks. Finally, we offer some observations on future directions as they relate to developing a systems-based model of the metastatic cascade.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA.
- Hormel Institute and Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Austin, MN, 55912, USA.
| | - Pavel Kraikivski
- Academy of Integrated Science, Division of Systems Biology, Virginia Polytechnic and State University, Blacksburg, VA, 24061, USA
| | - Brandon C Jones
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Catherine M Sevigny
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Surojeet Sengupta
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| |
Collapse
|
15
|
Lee D, Park Y, Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform 2020; 22:5896573. [PMID: 34020548 DOI: 10.1093/bib/bbaa188] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/29/2020] [Accepted: 07/21/2020] [Indexed: 12/19/2022] Open
Abstract
The multi-omics molecular characterization of cancer opened a new horizon for our understanding of cancer biology and therapeutic strategies. However, a tumor biopsy comprises diverse types of cells limited not only to cancerous cells but also to tumor microenvironmental cells and adjacent normal cells. This heterogeneity is a major confounding factor that hampers a robust and reproducible bioinformatic analysis for biomarker identification using multi-omics profiles. Besides, the heterogeneity itself has been recognized over the years for its significant prognostic values in some cancer types, thus offering another promising avenue for therapeutic intervention. A number of computational approaches to unravel such heterogeneity from high-throughput molecular profiles of a tumor sample have been proposed, but most of them rely on the data from an individual omics layer. Since the heterogeneity of cells is widely distributed across multi-omics layers, methods based on an individual layer can only partially characterize the heterogeneous admixture of cells. To help facilitate further development of the methodologies that synchronously account for several multi-omics profiles, we wrote a comprehensive review of diverse approaches to characterize tumor heterogeneity based on three different omics layers: genome, epigenome and transcriptome. As a result, this review can be useful for the analysis of multi-omics profiles produced by many large-scale consortia. Contact:sunkim.bioinfo@snu.ac.kr.
Collapse
Affiliation(s)
- Dohoon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Youngjune Park
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
16
|
Du R, Carey V, Weiss ST. deconvSeq: deconvolution of cell mixture distribution in sequencing data. Bioinformatics 2020; 35:5095-5102. [PMID: 31147676 DOI: 10.1093/bioinformatics/btz444] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 05/17/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Although single-cell sequencing is becoming more widely available, many tissue samples such as intracranial aneurysms are both fibrous and minute, and therefore not easily dissociated into single cells. To account for the cell type heterogeneity in such tissues therefore requires a computational method. We present a computational deconvolution method, deconvSeq, for sequencing data (RNA and bisulfite) obtained from bulk tissue. This method can also be applied to single-cell RNA sequencing data. RESULTS DeconvSeq utilizes a generalized linear model to model effects of tissue type on feature quantification, which is specific to the data structure of the sequencing type used. Estimated model coefficients can then be used to predict the cell type mixture within a tissue. Predicted cell type mixtures were validated against actual cell counts in whole blood samples. Using this method, we obtained a mean correlation of 0.998 (95% CI 0.995-0.999) from the RNA sequencing data of 35 whole blood samples and 0.95 (95% CI 0.91-0.98) from the reduced representation bisulfite sequencing data from 35 whole blood samples. Using symmetric balances to obtain the correlation between compositional parts, we found that the lowest correlation occurred for monocytes for both RNA and bisulfite sequencing. Comparison with other methods of decomposition such as deconRNAseq, CIBERSORT, MuSiC and EpiDISH showed that deconvSeq is able to achieve good prediction using mean correlation with far fewer genes or CpG sites in the signature set. AVAILABILITY AND IMPLEMENTATION Software implementing deconvSeq is available at https://github.com/rosedu1/deconvSeq. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rose Du
- Department of Neurosurgery, Boston, MA, USA.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Vince Carey
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
17
|
Ji Y, Yu C, Zhang H. contamDE-lm: linear model-based differential gene expression analysis using next-generation RNA-seq data from contaminated tumor samples. Bioinformatics 2020; 36:2492-2499. [PMID: 31917401 DOI: 10.1093/bioinformatics/btaa006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 11/30/2019] [Accepted: 01/03/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tumor and adjacent normal RNA samples are commonly used to screen differentially expressed genes between normal and tumor samples or among tumor subtypes. Such paired-sample design could avoid numerous confounders in differential expression (DE) analysis, but the cellular contamination of tumor samples can be an important noise and confounding factor, which can both inflate false-positive rate and deflate true-positive rate. The existing DE tools that use next-generation RNA-seq data either do not account for cellular contamination or are computationally extensive with increasingly large sample size. RESULTS A novel linear model was proposed to avoid the problem that could arise from tumor-normal correlation for paired samples. A statistically robust and computationally very fast DE analysis procedure, contamDE-lm, was developed based on the novel model to account for cellular contamination, boosting DE analysis power through the reduction in individual residual variances using gene-wise information. The desired advantages of contamDE-lm over some state-of-the-art methods (limma and DESeq2) were evaluated through the applications to simulated data, TCGA database and Gene Expression Omnibus (GEO) database. AVAILABILITY AND IMPLEMENTATION The proposed method contamDE-lm was implemented in an updated R package contamDE (version 2.0), which is freely available at https://github.com/zhanghfd/contamDE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yifan Ji
- Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai 200438, People's Republic of China
| | - Chang Yu
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | - Hong Zhang
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| |
Collapse
|
18
|
Parker SJ, Chen L, Spivia W, Saylor G, Mao C, Venkatraman V, Holewinski RJ, Mastali M, Pandey R, Athas G, Yu G, Fu Q, Troxlair D, Vander Heide R, Herrington D, Van Eyk JE, Wang Y. Identification of Putative Early Atherosclerosis Biomarkers by Unsupervised Deconvolution of Heterogeneous Vascular Proteomes. J Proteome Res 2020; 19:2794-2806. [PMID: 32202800 DOI: 10.1021/acs.jproteome.0c00118] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Coronary artery disease remains a leading cause of death in industrialized nations, and early detection of disease is a critical intervention target to effectively treat patients and manage risk. Proteomic analysis of mixed tissue homogenates may obscure subtle protein changes that occur uniquely in underlying tissue subtypes. The unsupervised 'convex analysis of mixtures' (CAM) tool has previously been shown to effectively segregate cellular subtypes from mixed expression data. In this study, we hypothesized that CAM would identify proteomic information specifically informative to early atherosclerosis lesion involvement that could lead to potential markers of early disease detection. We quantified the proteome of 99 paired abdominal aorta (AA) and left anterior descending coronary artery (LAD) specimens (N = 198 specimens total) acquired during autopsy of young adults free of diagnosed cardiac disease. The CAM tool was then used to segregate protein subsets uniquely associated with different underlying tissue types, yielding markers of normal and fibrous plaque (FP) tissues in LAD and AA (N = 62 lesions markers). CAM-derived FP marker expression was validated against pathologist estimated luminal surface involvement of FP, as well as in an orthogonal cohort of "pure" fibrous plaque, fatty streak, and normal vascular specimens. A targeted mass spectrometry (MS) assay quantified 39 of 62 CAM-FP markers in plasma from women with angiographically verified coronary artery disease (CAD, N = 46) or free from apparent CAD (control, N = 40). Elastic net variable selection with logistic regression reduced this list to 10 proteins capable of classifying CAD status in this cohort with <6% misclassification error, and a mean area under the receiver operating characteristic curve of 0.992 (confidence interval 0.968-0.998) after cross validation. The proteomics-CAM workflow identified lesion-specific molecular biomarker candidates by distilling the most representative molecules from heterogeneous tissue types.
Collapse
Affiliation(s)
- Sarah J Parker
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia 24061, United States
| | - Weston Spivia
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Georgia Saylor
- Department of Cardiovascular Medicine, Wake Forest University, Winston-Salem, North Carolina 27101, United States
| | - Chunhong Mao
- Biocomplexity Institute & Initiative, University of Virginia, Charlottesville, Virginia 22904, United States
| | - Vidya Venkatraman
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Ronald J Holewinski
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Mitra Mastali
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Rakhi Pandey
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Grace Athas
- Department of Pathology, Louisiana State University, New Orleans, Louisiana 70112, United States
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia 24061, United States
| | - Qin Fu
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Dana Troxlair
- Department of Pathology, Louisiana State University, New Orleans, Louisiana 70112, United States
| | - Richard Vander Heide
- Department of Pathology, Louisiana State University, New Orleans, Louisiana 70112, United States
| | - David Herrington
- Department of Cardiovascular Medicine, Wake Forest University, Winston-Salem, North Carolina 27101, United States
| | - Jennifer E Van Eyk
- Heart Institute & Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia 24061, United States
| |
Collapse
|
19
|
Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019; 15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open
Abstract
Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq. Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| | - Qian Meng
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - David M. Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Melissa Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| |
Collapse
|
20
|
Bogucka-Kocka A, Zalewski DP, Ruszel KP, Stępniewski A, Gałkowski D, Bogucki J, Komsta Ł, Kołodziej P, Zubilewicz T, Feldo M, Kocki J. Dysregulation of MicroRNA Regulatory Network in Lower Extremities Arterial Disease. Front Genet 2019; 10:1200. [PMID: 31827490 PMCID: PMC6892359 DOI: 10.3389/fgene.2019.01200] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 10/29/2019] [Indexed: 01/12/2023] Open
Abstract
Atherosclerosis and its comorbidities are the major contributors to the global burden of death worldwide. Lower extremities arterial disease (LEAD) is a common manifestation of atherosclerotic disease of arteries of lower extremities. MicroRNAs belong to epigenetic factors that regulate gene expression and have not yet been extensively studied in LEAD. We aimed to indicate the most promising microRNA and gene expression signatures of LEAD, to identify interactions between microRNA and genes and to describe potential effect of modulated gene expression. High-throughput sequencing was employed to examine microRNAome and transcriptome of peripheral blood mononuclear cells of patients with LEAD, in relation to controls. Statistical significance of microRNAs and genes analysis results was evaluated using DESeq2 and uninformative variable elimination by partial least squares methods. Altered expression of 26 microRNAs (hsa-let-7f-1-3p, hsa-miR-34a-5p, -122-5p, -3591-3p, -34a-3p, -1261, -21-5p, -15a-5p, -548d-5p, -34b-5p, -424-3p, -548aa, -548t-3p, -4423-3p, -196a-5p, -330-3p, -766-3p, -30e-3p, -125b-5p, -1301-3p, -3184-5p, -423-3p, -339-3p, -138-5p, -99a-3p, and -6087) and 14 genes (AK5, CD248, CDS2, FAM129A, FBLN2, GGT1, NOG, NRCAM, PDE7A, RP11-545E17.3, SLC12A2, SLC16A10, SLC4A10, and ZSCAN18) were the most significantly differentially expressed in LEAD group compared to controls. Discriminative value of revealed microRNAs and genes were confirmed by receiver operating characteristic analysis. Dysregulations of 26 microRNAs and 14 genes were used to propose novel biomarkers of LEAD. Regulatory interactions between biomarker microRNAs and genes were studied in silico using R multiMiR package. Functional analysis of genes modulated by proposed biomarker microRNAs was performed using DAVID 6.8 tools and revealed terms closely related to atherosclerosis and, interestingly, the processes involving nervous system. The study provides new insight into microRNA-dependent regulatory mechanisms involved in pathology of LEAD. Proposed microRNA and gene biomarkers of LEAD may provide new diagnostic and therapeutic opportunities.
Collapse
Affiliation(s)
- Anna Bogucka-Kocka
- Chair and Department of Biology and Genetics, Medical University of Lublin, Lublin, Poland
| | - Daniel P Zalewski
- Chair and Department of Biology and Genetics, Medical University of Lublin, Lublin, Poland
| | - Karol P Ruszel
- Department of Clinical Genetics, Chair of Medical Genetics, Medical University of Lublin, Lublin, Poland
| | - Andrzej Stępniewski
- Ecotech Complex, Analytical and Programme Centre for Advanced Environmentally-Friendly Technologies, University of Marie Curie-Sklodowska, Lublin, Poland
| | - Dariusz Gałkowski
- Department of Pathology and Laboratory Medicine, Rutgers-Robert Wood Johnson Medical School, New Brunswick, NJ, United States
| | - Jacek Bogucki
- Department of Clinical Genetics, Chair of Medical Genetics, Medical University of Lublin, Lublin, Poland
| | - Łukasz Komsta
- Chair and Department of Medicinal Chemistry, Medical University of Lublin, Lublin, Poland
| | - Przemysław Kołodziej
- Chair and Department of Biology and Genetics, Medical University of Lublin, Lublin, Poland
| | - Tomasz Zubilewicz
- Department of Vascular Surgery and Angiology, Medical University of Lublin, Lublin, Poland
| | - Marcin Feldo
- Department of Vascular Surgery and Angiology, Medical University of Lublin, Lublin, Poland
| | - Janusz Kocki
- Department of Clinical Genetics, Chair of Medical Genetics, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
21
|
De novo compartment deconvolution and weight estimation of tumor samples using DECODER. Nat Commun 2019; 10:4729. [PMID: 31628300 PMCID: PMC6802116 DOI: 10.1038/s41467-019-12517-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 09/06/2019] [Indexed: 12/11/2022] Open
Abstract
Tumors are mixtures of different compartments. While global gene expression analysis profiles the average expression of all compartments in a sample, identifying the specific contribution of each compartment remains a challenge. With the increasing recognition of the importance of non-neoplastic components, the ability to breakdown the gene expression contribution of each is critical. Here, we develop DECODER, an integrated framework which performs de novo deconvolution and single-sample compartment weight estimation. We use DECODER to deconvolve 33 TCGA tumor RNA-seq data sets and show that it may be applied to other data types including ATAC-seq. We demonstrate that it can be utilized to reproducibly estimate cellular compartment weights in pancreatic cancer that are clinically meaningful. Application of DECODER across cancer types advances the capability of identifying cellular compartments in an unknown sample and may have implications for identifying the tumor of origin for cancers of unknown primary. Separating different cell compartments from bulk gene expression data can be challenging. Here the authors present DECODER, which can perform de novo deconvolutions on non-negative matrices including microarray, RNA-seq and ATAC-seq data sets.
Collapse
|
22
|
Sepulveda JL. Using R and Bioconductor in Clinical Genomics and Transcriptomics. J Mol Diagn 2019; 22:3-20. [PMID: 31605800 DOI: 10.1016/j.jmoldx.2019.08.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 05/02/2019] [Accepted: 08/08/2019] [Indexed: 02/08/2023] Open
Abstract
Bioinformatics pipelines are essential in the analysis of genomic and transcriptomic data generated by next-generation sequencing (NGS). Recent guidelines emphasize the need for rigorous validation and assessment of robustness, reproducibility, and quality of NGS analytic pipelines intended for clinical use. Software tools written in the R statistical language and, in particular, the set of tools available in the Bioconductor repository are widely used in research bioinformatics; and these frameworks offer several advantages for use in clinical bioinformatics, including the breath of available tools, modular nature of software packages, ease of installation, enforcement of interoperability, version control, and short learning curve. This review provides an introduction to R and Bioconductor software, its advantages and limitations for clinical bioinformatics, and illustrative examples of tools that can be used in various steps of NGS analysis.
Collapse
Affiliation(s)
- Jorge L Sepulveda
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, New York; Informatics Subdivision Leadership, Association for Molecular Pathology, Bethesda, Maryland.
| |
Collapse
|
23
|
Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 2019; 34:1969-1979. [PMID: 29351586 DOI: 10.1093/bioinformatics/bty019] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open
Abstract
Summary Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. In this review, we highlight the importance and value of computational deconvolution methods to infer the abundance of different cell types and/or cell type-specific expression profiles in heterogeneous samples without performing physical cell sorting. We also explain the various deconvolution scenarios, the mathematical approaches used to solve them and the effect of data processing and different confounding factors on the accuracy of the deconvolution results. Contact katleen.depreter@ugent.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francisco Avila Cobos
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Katleen De Preter
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| |
Collapse
|
24
|
Fox NS, Haider S, Harris AL, Boutros PC. Landscape of transcriptomic interactions between breast cancer and its microenvironment. Nat Commun 2019; 10:3116. [PMID: 31308365 PMCID: PMC6629667 DOI: 10.1038/s41467-019-10929-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 06/04/2019] [Indexed: 12/31/2022] Open
Abstract
Solid tumours comprise mixtures of tumour cells (TCs) and tumour-adjacent cells (TACs), and the intricate interconnections between these diverse populations shape the tumour’s microenvironment. Despite this complexity, clinical genomic profiling is typically performed from bulk samples, without distinguishing TCs from TACs. To better understand TC–TAC interactions, we computationally distinguish their transcriptomes in 1780 primary breast tumours. We show that TC and TAC mRNA abundances are divergently associated with clinical phenotypes, including tumour subtypes and patient survival. These differences reflect distinct responses of TCs and TACs to specific somatic driver mutations, particularly TP53. These data further elucidate how the molecular interplay between breast tumours and their microenvironment drives aggressive tumour phenotypes. The transcriptomic profile of tumour-adjacent cells provides important information about tumour context but its clinical utility is unclear. Here, in breast cancer, Fox et al. show that the mRNA abundances of tumour and tumour-adjacent cells hold prognostic information.
Collapse
Affiliation(s)
- Natalie S Fox
- Ontario Institute for Cancer Research, Toronto, ON, M5G 0A3, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.
| | - Syed Haider
- Ontario Institute for Cancer Research, Toronto, ON, M5G 0A3, Canada.,Department of Oncology, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, UK.,The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Adrian L Harris
- Department of Oncology, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Paul C Boutros
- Ontario Institute for Cancer Research, Toronto, ON, M5G 0A3, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A8, Canada. .,Department of Human Genetics, University of California, Los Angeles, CA, 90095, USA. .,Department of Urology, University of California, Los Angeles, CA, 90024, USA. .,Broad Stem Cell Research Center, University of California, Los Angeles, CA, 90095, USA. .,Institute for Precision Health, University of California, Los Angeles, CA, 90095, USA. .,Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90024, USA.
| |
Collapse
|
25
|
Clarke R, Tyson JJ, Tan M, Baumann WT, Jin L, Xuan J, Wang Y. Systems biology: perspectives on multiscale modeling in research on endocrine-related cancers. Endocr Relat Cancer 2019; 26:R345-R368. [PMID: 30965282 PMCID: PMC7045974 DOI: 10.1530/erc-18-0309] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 04/08/2019] [Indexed: 12/12/2022]
Abstract
Drawing on concepts from experimental biology, computer science, informatics, mathematics and statistics, systems biologists integrate data across diverse platforms and scales of time and space to create computational and mathematical models of the integrative, holistic functions of living systems. Endocrine-related cancers are well suited to study from a systems perspective because of the signaling complexities arising from the roles of growth factors, hormones and their receptors as critical regulators of cancer cell biology and from the interactions among cancer cells, normal cells and signaling molecules in the tumor microenvironment. Moreover, growth factors, hormones and their receptors are often effective targets for therapeutic intervention, such as estrogen biosynthesis, estrogen receptors or HER2 in breast cancer and androgen receptors in prostate cancer. Given the complexity underlying the molecular control networks in these cancers, a simple, intuitive understanding of how endocrine-related cancers respond to therapeutic protocols has proved incomplete and unsatisfactory. Systems biology offers an alternative paradigm for understanding these cancers and their treatment. To correctly interpret the results of systems-based studies requires some knowledge of how in silico models are built, and how they are used to describe a system and to predict the effects of perturbations on system function. In this review, we provide a general perspective on the field of cancer systems biology, and we explore some of the advantages, limitations and pitfalls associated with using predictive multiscale modeling to study endocrine-related cancers.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - John J Tyson
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Ming Tan
- Department of Biostatistics, Bioinformatics & Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - William T Baumann
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Lu Jin
- Department of Oncology, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - Jianhua Xuan
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia, USA
| |
Collapse
|
26
|
A novel matched-pairs feature selection method considering with tumor purity for differential gene expression analyses. Math Biosci 2019; 311:39-48. [DOI: 10.1016/j.mbs.2019.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 02/21/2019] [Accepted: 02/22/2019] [Indexed: 12/13/2022]
|
27
|
Ng JCF, Quist J, Grigoriadis A, Malim MH, Fraternali F. Pan-cancer transcriptomic analysis dissects immune and proliferative functions of APOBEC3 cytidine deaminases. Nucleic Acids Res 2019; 47:1178-1194. [PMID: 30624727 PMCID: PMC6379723 DOI: 10.1093/nar/gky1316] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/19/2018] [Accepted: 01/04/2019] [Indexed: 12/25/2022] Open
Abstract
APOBEC3 cytidine deaminases are largely known for their innate immune protection from viral infections. Recently, members of the family have been associated with a distinct mutational activity in some cancer types. We report a pan-tissue, pan-cancer analysis of RNA-seq data specific to the APOBEC3 genes in 8,951 tumours, 786 cancer cell lines and 6,119 normal tissues. By deconvolution of levels of different cell types in tumour admixtures, we demonstrate that APOBEC3B (A3B), the primary candidate as a cancer mutagen, shows little association with immune cell types compared to its paralogues. We present a pipeline called RESPECTEx (REconstituting SPecific Cell-Type Expression) and use it to deconvolute cell-type specific expression levels in a given cohort of tumour samples. We functionally annotate APOBEC3 co-expressing genes, and create an interactive visualization tool which 'barcodes' the functional enrichment (http://fraternalilab.kcl.ac.uk/apobec-barcodes/). These analyses reveal that A3B expression correlates with cell cycle and DNA repair genes, whereas the other APOBEC3 members display specificity for immune processes and immune cell populations. We offer molecular insights into the functions of individual APOBEC3 proteins in antiviral and proliferative contexts, and demonstrate the diversification this family of enzymes displays at the transcriptomic level, despite their high similarity in protein sequences and structures.
Collapse
Affiliation(s)
- Joseph C F Ng
- Randall Centre for Cell and Molecular Biophysics, King's College London, London, UK
| | - Jelmar Quist
- Cancer Bioinformatics, School of Cancer and Pharmaceutical Sciences, CRUK King's Health Partners Centre, Breast Cancer Now Research Unit, King's College London, London, UK
| | - Anita Grigoriadis
- Cancer Bioinformatics, School of Cancer and Pharmaceutical Sciences, CRUK King's Health Partners Centre, Breast Cancer Now Research Unit, King's College London, London, UK
| | - Michael H Malim
- Department of Infectious Diseases, School of Immunology and Microbial Sciences, King's College London, London, UK
| | - Franca Fraternali
- Randall Centre for Cell and Molecular Biophysics, King's College London, London, UK
| |
Collapse
|
28
|
Radiomic analysis of imaging heterogeneity in tumours and the surrounding parenchyma based on unsupervised decomposition of DCE-MRI for predicting molecular subtypes of breast cancer. Eur Radiol 2019; 29:4456-4467. [PMID: 30617495 DOI: 10.1007/s00330-018-5891-3] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 10/02/2018] [Accepted: 11/13/2018] [Indexed: 10/27/2022]
Abstract
OBJECTIVES This study aimed to predict the molecular subtypes of breast cancer via intratumoural and peritumoural radiomic analysis with subregion identification based on the decomposition of contrast-enhanced magnetic resonance imaging (DCE-MRI). METHODS The study included 211 women with histopathologically confirmed breast cancer. We utilised a completely unsupervised convex analysis of mixtures (CAM) method by unmixing dynamic imaging series from heterogeneous tissues. Each tumour and the surrounding parenchyma were thus decomposed into multiple subregions, representing different vascular characterisations, from which radiomic features were extracted. A random forest model was trained and tested using a leave-one-out cross-validation (LOOCV) method to predict breast cancer subtypes. The predictive models from tumoural and peritumoural subregions were fused for classification. RESULTS Tumour and peritumour DCE-MR images were decomposed into three compartments, representing plasma input, fast-flow kinetics, and slow-flow kinetics. The tumour subregion related to fast-flow kinetics showed the best performance among the subregions for differentiating between patients with four molecular subtypes (area under the receiver operating characteristic curve (AUC) = 0.832), exhibiting an AUC value significantly (p < 0.0001) higher than that obtained with the entire tumour (AUC = 0.719). When the tumour- and parenchyma-based predictive models were fused, the performance, measured as the AUC, increased to 0.897; this value was significantly higher than that obtained with other tumour partition methods. CONCLUSIONS Radiomic analysis of intratumoural and peritumoural heterogeneity based on the decomposition of image time-series signals has the potential to more accurately identify tumour kinetic features and serve as a valuable clinical marker to enhance the prediction of breast cancer subtypes. KEY POINTS • Decomposition of image time-series signals has the potential to more accurately identify tumour kinetic features. • Fusion of intratumoural- and peritumoural-based predictive models improves the prediction of breast cancer subtypes.
Collapse
|
29
|
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018; 19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes. RESULTS We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA). CONCLUSION Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).
Collapse
Affiliation(s)
- Konstantina Dimitrakopoulou
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Lars A Akslen
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway. .,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
30
|
Lin CH, Chi CY, Chen L, Miller DJ, Wang Y. Detection of Sources in Non-Negative Blind Source Separation by Minimum Description Length Criterion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4022-4037. [PMID: 28981430 DOI: 10.1109/tnnls.2017.2749279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
While non-negative blind source separation (nBSS) has found many successful applications in science and engineering, model order selection, determining the number of sources, remains a critical yet unresolved problem. Various model order selection methods have been proposed and applied to real-world data sets but with limited success, with both order over- and under-estimation reported. By studying existing schemes, we have found that the unsatisfactory results are mainly due to invalid assumptions, model oversimplification, subjective thresholding, and/or to assumptions made solely for mathematical convenience. Building on our earlier work that reformulated model order selection for nBSS with more realistic assumptions and models, we report a newly and formally revised model order selection criterion rooted in the minimum description length (MDL) principle. Adopting widely invoked assumptions for achieving a unique nBSS solution, we consider the mixing matrix as consisting of deterministic unknowns, with the source signals following a multivariate Dirichlet distribution. We derive a computationally efficient, stochastic algorithm to obtain approximate maximum-likelihood estimates of model parameters and apply Monte Carlo integration to determine the description length. Our modeling and estimation strategy exploits the characteristic geometry of the data simplex in nBSS. We validate our nBSS-MDL criterion through extensive simulation studies and on four real-world data sets, demonstrating its strong performance and general applicability to nBSS. The proposed nBSS-MDL criterion consistently detects the true number of sources, in all of our case studies.
Collapse
|
31
|
Zhang W, Long H, He B, Yang J. DECtp: Calling Differential Gene Expression Between Cancer and Normal Samples by Integrating Tumor Purity Information. Front Genet 2018; 9:321. [PMID: 30210526 PMCID: PMC6121016 DOI: 10.3389/fgene.2018.00321] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 07/30/2018] [Indexed: 11/13/2022] Open
Abstract
Identifying differentially expressed genes (DEGs) between tumor and normal samples is critical for studying tumorigenesis, and has been routinely applied to identify diagnostic, prognostic, and therapeutic biomarkers for many cancers. It is well-known that solid tumor tissue samples obtained from clinical settings are always mixtures of cancer and normal cells. However, the tumor purity information is more or less ignored in traditional differential expression analyses, which might decrease the power of differential gene identification or even bias the results. In this paper, we have developed a novel differential gene calling method called DECtp by integrating tumor purity information into a generalized least square procedure, followed by the Wald test. We compared DECtp with popular methods like t-test and limma on nine simulation datasets with different sample sizes and noise levels. DECtp achieved the highest area under curves (AUCs) for all the comparisons, suggesting that cancer purity information is critical for DEG calling between tumor and normal samples. In addition, we applied DECtp into cancer and normal samples of 14 tumor types collected from The Cancer Genome Atlas (TCGA) and compared the DEGs with those called by limma. As a result, DECtp achieved more sensitive, consistent, and biologically meaningful results and identified a few novel DEGs for further experimental validation.
Collapse
Affiliation(s)
- Weiwei Zhang
- School of Science, East China University of Technology, Nanchang, China
| | - Haixia Long
- Department of Information Science and Technology, Hainan Normal University, Haikou, China
| | - Binsheng He
- The First Affiliated Hosptial, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- College of Information Engineering, Changsha Medical University, Changsha, China.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
32
|
Wang N, Chen L, Wang Y. Mathematical Modeling and Deconvolution of Molecular Heterogeneity Identifies Novel Subpopulations in Complex Tissues. Methods Mol Biol 2018; 1751:223-236. [PMID: 29508301 DOI: 10.1007/978-1-4939-7710-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised methods to deconvolve tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we develop a novel unsupervised deconvolution method, Convex Analysis of Mixtures (CAM), within a well-grounded mathematical framework, to dissect mixed gene expressions in heterogeneous tissue samples. To facilitate the utility of this method, we implement an R-Java CAM package that provides comprehensive analytic functions and graphic user interface (GUI).
Collapse
Affiliation(s)
- Niya Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.
| | - Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA
| |
Collapse
|
33
|
Cieślik M, Chinnaiyan AM. Cancer transcriptome profiling at the juncture of clinical translation. Nat Rev Genet 2017; 19:93-109. [PMID: 29279605 DOI: 10.1038/nrg.2017.96] [Citation(s) in RCA: 173] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Methodological breakthroughs over the past four decades have repeatedly revolutionized transcriptome profiling. Using RNA sequencing (RNA-seq), it has now become possible to sequence and quantify the transcriptional outputs of individual cells or thousands of samples. These transcriptomes provide a link between cellular phenotypes and their molecular underpinnings, such as mutations. In the context of cancer, this link represents an opportunity to dissect the complexity and heterogeneity of tumours and to discover new biomarkers or therapeutic strategies. Here, we review the rationale, methodology and translational impact of transcriptome profiling in cancer.
Collapse
Affiliation(s)
- Marcin Cieślik
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan
| | - Arul M Chinnaiyan
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan.,Comprehensive Cancer Center, University of Michigan.,Department of Urology, University of Michigan.,Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
34
|
Wen Y, Wei Y, Zhang S, Li S, Liu H, Wang F, Zhao Y, Zhang D, Zhang Y. Cell subpopulation deconvolution reveals breast cancer heterogeneity based on DNA methylation signature. Brief Bioinform 2017; 18:426-440. [PMID: 27016391 DOI: 10.1093/bib/bbw028] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 12/21/2022] Open
Abstract
Tumour heterogeneity describes the coexistence of divergent tumour cell clones within tumours, which is often caused by underlying epigenetic changes. DNA methylation is commonly regarded as a significant regulator that differs across cells and tissues. In this study, we comprehensively reviewed research progress on estimating of tumour heterogeneity. Bioinformatics-based analysis of DNA methylation has revealed the evolutionary relationships between breast cancer cell lines and tissues. Further analysis of the DNA methylation profiles in 33 breast cancer-related cell lines identified cell line-specific methylation patterns. Next, we reviewed the computational methods in inferring clonal evolution of tumours from different perspectives and then proposed a deconvolution strategy for modelling cell subclonal populations dynamics in breast cancer tissues based on DNA methylation. Further analysis of simulated cancer tissues and real cell lines revealed that this approach exhibits satisfactory performance and relative stability in estimating the composition and proportions of cellular subpopulations. The application of this strategy to breast cancer individuals of the Cancer Genome Atlas's identified different cellular subpopulations with distinct molecular phenotypes. Moreover, the current and potential future applications of this deconvolution strategy to clinical breast cancer research are discussed, and emphasis was placed on the DNA methylation-based recognition of intra-tumour heterogeneity. The wide use of these methods for estimating heterogeneity to further clinical cohorts will improve our understanding of neoplastic progression and the design of therapeutic interventions for treating breast cancer and other malignancies.
Collapse
|
35
|
Søndergaard D, Nielsen S, Pedersen CNS, Besenbacher S. Prediction of Primary Tumors in Cancers of Unknown Primary. J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2017-0013/jib-2017-0013.xml. [PMID: 28686574 PMCID: PMC6042823 DOI: 10.1515/jib-2017-0013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 04/18/2017] [Indexed: 12/22/2022] Open
Abstract
A cancer of unknown primary (CUP) is a metastatic cancer for which standard diagnostic tests fail to identify the location of the primary tumor. CUPs account for 3–5% of cancer cases. Using molecular data to determine the location of the primary tumor in such cases can help doctors make the right treatment choice and thus improve the clinical outcome. In this paper, we present a new method for predicting the location of the primary tumor using gene expression data: locating cancers of unknown primary (LoCUP). The method models the data as a mixture of normal and tumor cells and thus allows correct classification even in impure samples, where the tumor biopsy is contaminated by a large fraction of normal cells. We find that our method provides a significant increase in classification accuracy (95.8% over 90.8%) on simulated low-purity metastatic samples and shows potential on a small dataset of real metastasis samples with known origin.
Collapse
|
36
|
Dey KK, Hsiao CJ, Stephens M. Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genet 2017; 13:e1006599. [PMID: 28333934 PMCID: PMC5363805 DOI: 10.1371/journal.pgen.1006599] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 01/24/2017] [Indexed: 02/07/2023] Open
Abstract
Grade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple “populations”, and in natural language processing to model documents having words from multiple “topics”. Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes—from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust. Gene expression profile of a biological sample (either from single cells or pooled cells) results from a complex interplay of multiple related biological processes. Consequently, for example, distal tissue samples may share a similar gene expression profile through some common underlying biological processes. Our goal here is to illustrate that grade of membership (GoM) models—an approach widely used in population genetics to cluster admixed individuals who have ancestry from multiple populations—provide an attractive approach for clustering biological samples of RNA sequencing data. The GoM model allows each biological sample to have partial memberships in multiple biologically-distinct clusters, in contrast to traditional clustering methods that partition samples into distinct subgroups. We also provide methods for identifying genes that are distinctively expressed in each cluster to help biologically interpret the results. Applied to a dataset of 53 human tissues, the GoM approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to gene expression data of single cells from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and genes involved in a variety of relevant processes. Our study highlights the potential of GoM models for elucidating biological structure in RNA-seq gene expression data.
Collapse
Affiliation(s)
- Kushal K. Dey
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| | - Chiaowen Joyce Hsiao
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
37
|
Wang M, Tsai TH, Di Poto C, Ferrarini A, Yu G, Ressom HW. Topic model-based mass spectrometric data analysis in cancer biomarker discovery studies. BMC Genomics 2016; 17 Suppl 4:545. [PMID: 27535232 PMCID: PMC5001243 DOI: 10.1186/s12864-016-2796-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background A fundamental challenge in quantitation of biomolecules for cancer biomarker discovery is owing to the heterogeneous nature of human biospecimens. Although this issue has been a subject of discussion in cancer genomic studies, it has not yet been rigorously investigated in mass spectrometry based proteomic and metabolomic studies. Purification of mass spectometric data is highly desired prior to subsequent analysis, e.g., quantitative comparison of the abundance of biomolecules in biological samples. Methods We investigated topic models to computationally analyze mass spectrometric data considering both integrated peak intensities and scan-level features, i.e., extracted ion chromatograms (EICs). Probabilistic generative models enable flexible representation in data structure and infer sample-specific pure resources. Scan-level modeling helps alleviate information loss during data preprocessing. We evaluated the capability of the proposed models in capturing mixture proportions of contaminants and cancer profiles on LC-MS based serum proteomic and GC-MS based tissue metabolomic datasets acquired from patients with hepatocellular carcinoma (HCC) and liver cirrhosis as well as synthetic data we generated based on the serum proteomic data. Results The results we obtained by analysis of the synthetic data demonstrated that both intensity-level and scan-level purification models can accurately infer the mixture proportions and the underlying true cancerous sources with small average error ratios (<7 %) between estimation and ground truth. By applying the topic model-based purification to mass spectrometric data, we found more proteins and metabolites with significant changes between HCC cases and cirrhotic controls. Candidate biomarkers selected after purification yielded biologically meaningful pathway analysis results and improved disease discrimination power in terms of the area under ROC curve compared to the results found prior to purification. Conclusions We investigated topic model-based inference methods to computationally address the heterogeneity issue in samples analyzed by LC/GC-MS. We observed that incorporation of scan-level features have the potential to lead to more accurate purification results by alleviating the loss in information as a result of integrating peaks. We believe cancer biomarker discovery studies that use mass spectrometric analysis of human biospecimens can greatly benefit from topic model-based purification of the data prior to statistical and pathway analyses.
Collapse
Affiliation(s)
- Minkun Wang
- Department of Oncology, Georgetown University, 4000 Reservoir Rd NW, Washington D.C., USA.,Department of Electrical and Computer Engineering, Virginia Tech, 900 N Glebe Rd, Arlington, VA, USA
| | - Tsung-Heng Tsai
- Department of Oncology, Georgetown University, 4000 Reservoir Rd NW, Washington D.C., USA
| | - Cristina Di Poto
- Department of Oncology, Georgetown University, 4000 Reservoir Rd NW, Washington D.C., USA
| | - Alessia Ferrarini
- Department of Oncology, Georgetown University, 4000 Reservoir Rd NW, Washington D.C., USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Tech, 900 N Glebe Rd, Arlington, VA, USA
| | - Habtom W Ressom
- Department of Oncology, Georgetown University, 4000 Reservoir Rd NW, Washington D.C., USA.
| |
Collapse
|
38
|
Reinartz S, Finkernagel F, Adhikary T, Rohnalter V, Schumann T, Schober Y, Nockher WA, Nist A, Stiewe T, Jansen JM, Wagner U, Müller-Brüsselbach S, Müller R. A transcriptome-based global map of signaling pathways in the ovarian cancer microenvironment associated with clinical outcome. Genome Biol 2016; 17:108. [PMID: 27215396 PMCID: PMC4877997 DOI: 10.1186/s13059-016-0956-6] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 04/15/2016] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Soluble protein and lipid mediators play essential roles in the tumor environment, but their cellular origins, targets, and clinical relevance are only partially known. We have addressed this question for the most abundant cell types in human ovarian carcinoma ascites, namely tumor cells and tumor-associated macrophages. RESULTS Transcriptome-derived datasets were adjusted for errors caused by contaminating cell types by an algorithm using expression data derived from pure cell types as references. These data were utilized to construct a network of autocrine and paracrine signaling pathways comprising 358 common and 58 patient-specific signaling mediators and their receptors. RNA sequencing based predictions were confirmed for several proteins and lipid mediators. Published expression microarray results for 1018 patients were used to establish clinical correlations for a number of components with distinct cellular origins and target cells. Clear associations with early relapse were found for STAT3-inducing cytokines, specific components of WNT and fibroblast growth factor signaling, ephrin and semaphorin axon guidance molecules, and TGFβ/BMP-triggered pathways. An association with early relapse was also observed for secretory macrophage-derived phospholipase PLA2G7, its product arachidonic acid (AA) and signaling pathways controlled by the AA metabolites PGE2, PGI2, and LTB4. By contrast, the genes encoding norrin and its receptor frizzled 4, both selectively expressed by cancer cells and previously not linked to tumor suppression, show a striking association with a favorable clinical course. CONCLUSIONS We have established a signaling network operating in the ovarian cancer microenvironment with previously unidentified pathways and have defined clinically relevant components within this network.
Collapse
Affiliation(s)
- Silke Reinartz
- Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Florian Finkernagel
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Till Adhikary
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Verena Rohnalter
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Tim Schumann
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Yvonne Schober
- Metabolomics Core Facility and Institute of Laboratory Medicine and Pathobiochemistry, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - W Andreas Nockher
- Metabolomics Core Facility and Institute of Laboratory Medicine and Pathobiochemistry, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Andrea Nist
- Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Thorsten Stiewe
- Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Julia M Jansen
- Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Uwe Wagner
- Clinic for Gynecology, Gynecological Oncology and Gynecological Endocrinology, Center for Tumor Biology and Immunology (ZTI), Philipps University, Marburg, Germany
| | - Sabine Müller-Brüsselbach
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany
| | - Rolf Müller
- Institute of Molecular Biology and Tumor Research (IMT), Center for Tumor Biology and Immunology (ZTI), Philipps University, Hans-Meerwein-Str. 3, Marburg, 35043, Germany.
| |
Collapse
|
39
|
Wang N, Hoffman EP, Chen L, Chen L, Zhang Z, Liu C, Yu G, Herrington DM, Clarke R, Wang Y. Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues. Sci Rep 2016; 6:18909. [PMID: 26739359 PMCID: PMC4703969 DOI: 10.1038/srep18909] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 11/23/2015] [Indexed: 01/18/2023] Open
Abstract
Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.
Collapse
Affiliation(s)
- Niya Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Eric P. Hoffman
- Research Center for Genetic Medicine, Children’s National Medical Center, Washington, DC 20007, USA
| | - Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Li Chen
- Pediatric Oncology Branch, National Institutes of Health, Gaithersburg, MD 20877, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Chunyu Liu
- Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - David M. Herrington
- Department of Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
40
|
Shen Q, Hu J, Jiang N, Hu X, Luo Z, Zhang H. contamDE: differential expression analysis of RNA-seq data for contaminated tumor samples. Bioinformatics 2015; 32:705-12. [DOI: 10.1093/bioinformatics/btv657] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 11/03/2015] [Indexed: 11/14/2022] Open
|
41
|
Fu Y, Yu G, Levine DA, Wang N, Shih IM, Zhang Z, Clarke R, Wang Y. BACOM2.0 facilitates absolute normalization and quantification of somatic copy number alterations in heterogeneous tumor. Sci Rep 2015; 5:13955. [PMID: 26350498 PMCID: PMC4563570 DOI: 10.1038/srep13955] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 08/07/2015] [Indexed: 11/18/2022] Open
Abstract
Most published copy number datasets on solid tumors were obtained from specimens comprised of mixed cell populations, for which the varying tumor-stroma proportions are unknown or unreported. The inability to correct for signal mixing represents a major limitation on the use of these datasets for subsequent analyses, such as discerning deletion types or detecting driver aberrations. We describe the BACOM2.0 method with enhanced accuracy and functionality to normalize copy number signals, detect deletion types, estimate tumor purity, quantify true copy numbers, and calculate average-ploidy value. While BACOM has been validated and used with promising results, subsequent BACOM analysis of the TCGA ovarian cancer dataset found that the estimated average tumor purity was lower than expected. In this report, we first show that this lowered estimate of tumor purity is the combined result of imprecise signal normalization and parameter estimation. Then, we describe effective allele-specific absolute normalization and quantification methods that can enhance BACOM applications in many biological contexts while in the presence of various confounders. Finally, we discuss the advantages of BACOM in relation to alternative approaches. Here we detail this revised computational approach, BACOM2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Yi Fu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Douglas A Levine
- Department of Surgery, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA
| | - Niya Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Ie-Ming Shih
- Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Zhen Zhang
- Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
42
|
Anghel CV, Quon G, Haider S, Nguyen F, Deshwar AG, Morris QD, Boutros PC. ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics 2015; 16:156. [PMID: 25972088 PMCID: PMC4429941 DOI: 10.1186/s12859-015-0597-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/27/2015] [Indexed: 01/23/2023] Open
Abstract
Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0597-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Catalina V Anghel
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada.
| | - Gerald Quon
- Department of Computer Science, University of Toronto, 10 King's College Road, Room 3303, M5S 3G4, Toronto, ON, Canada.
| | - Syed Haider
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada. .,Department of Oncology, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, United Kingdom.
| | - Francis Nguyen
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada.
| | - Amit G Deshwar
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College, Room SFB540, Toronto, M5S 3G4, ON, Canada.
| | - Quaid D Morris
- Department of Computer Science, University of Toronto, 10 King's College Road, Room 3303, M5S 3G4, Toronto, ON, Canada. .,Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College, Room SFB540, Toronto, M5S 3G4, ON, Canada. .,Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Room 4396, Toronto, M4S 1A8, ON, Canada. .,The Donnelly Centre, 160 College Street, Room 230, Toronto, M5S 3E1, ON, Canada.
| | - Paul C Boutros
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada. .,Department of Medical Biophysics, University of Toronto, 101 College Street, Toronto, M5G 1L7, ON, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, 1 King's College Circle, Toronto, M5S 1A8, ON, Canada.
| |
Collapse
|