1
|
Li Y, Xu S, Wang X, Ertekin-Taner N, Chen D. An augmented GSNMF model for complete deconvolution of bulk RNA-seq data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2025; 22:988-1018. [PMID: 40296800 PMCID: PMC12043048 DOI: 10.3934/mbe.2025036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Performing complete deconvolution analysis for bulk RNA-seq data to obtain both cell type specific gene expression profiles (GEP) and relative cell abundances is a challenging task. One of the fundamental models used, the nonnegative matrix factorization (NMF), is mathematically ill-posed. Although several complete deconvolution methods have been developed, and their estimates compared to ground truth for some datasets appear promising, a comprehensive understanding of how to circumvent the ill-posedness and improve solution accuracy is lacking. In this paper, we first investigated the necessary requirements for a given dataset to satisfy the solvability conditions in NMF theory. Even with solvability conditions, the "unique" solutions of NMF are subject to a rescaling matrix. Therefore, we provide estimates of the converged local minima and the possible rescaling matrix, based on informative initial conditions. Using these strategies, we developed a new pipeline of pseudo-bulk tissue data augmented, geometric structure guided NMF model (GSNMF+). In our approach, pseudo-bulk tissue data was generated, by statistical distribution simulated pseudo cellular compositions and single-cell RNA-seq (scRNA-seq) data, and then mixed with the original dataset. The constituent matrices of the hybrid dataset then satisfy the weak solvability conditions of NMF. Furthermore, an estimated rescaling matrix was used to adjust the minimizer of the NMF, which was expected to reduce mean square root errors of solutions. Our algorithms are tested on several realistic bulk-tissue datasets and showed significant improvements in scenarios with singular cellular compositions.
Collapse
Affiliation(s)
- Yujie Li
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
- School of Data Science, University of North Carolina at Charlotte, USA
| | - Su Xu
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
| | - Xue Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Florida, USA
| | - Nilüfer Ertekin-Taner
- Department of Neurosciences, Mayo Clinic, Florida, USA
- Department of Neurology, Mayo Clinic, Florida, USA
| | - Duan Chen
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
| |
Collapse
|
2
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. Genome Med 2024; 16:65. [PMID: 38685057 PMCID: PMC11057104 DOI: 10.1186/s13073-024-01338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
Using computational tools, bulk transcriptomics can be deconvoluted to estimate the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, ignoring person-to-person heterogeneity. Here, we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. Simulation studies demonstrate reduced bias compared with existing methods. Real data analyses on longitudinal consortia show disparities in cell type proportions are associated with several disease phenotypes in Type 1 diabetes and Parkinson's disease. imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/ .
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA.
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA.
| |
Collapse
|
3
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.27.559579. [PMID: 37808714 PMCID: PMC10557724 DOI: 10.1101/2023.09.27.559579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Real-world clinical samples are often admixtures of signal mosaics from multiple pure cell types. Using computational tools, bulk transcriptomics can be deconvoluted to solve for the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, which ignores person-to-person heterogeneity. Here we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. imply can borrow information across repeatedly measured samples for each subject, and obtain precise cell type proportion estimations. Simulation studies demonstrate reduced bias in cell type abundance estimation compared with existing methods. Real data analyses on large longitudinal consortia show more realistic deconvolution results that align with biological facts. Our results suggest that disparities in cell type proportions are associated with several disease phenotypes in type 1 diabetes and Parkinson's disease. Our proposed tool imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/.
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R. Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| |
Collapse
|
4
|
Tran KA, Addala V, Johnston RL, Lovell D, Bradley A, Koufariotis LT, Wood S, Wu SZ, Roden D, Al-Eryani G, Swarbrick A, Williams ED, Pearson JV, Kondrashova O, Waddell N. Performance of tumour microenvironment deconvolution methods in breast cancer using single-cell simulated bulk mixtures. Nat Commun 2023; 14:5758. [PMID: 37717006 PMCID: PMC10505141 DOI: 10.1038/s41467-023-41385-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 09/01/2023] [Indexed: 09/18/2023] Open
Abstract
Cells within the tumour microenvironment (TME) can impact tumour development and influence treatment response. Computational approaches have been developed to deconvolve the TME from bulk RNA-seq. Using scRNA-seq profiling from breast tumours we simulate thousands of bulk mixtures, representing tumour purities and cell lineages, to compare the performance of nine TME deconvolution methods (BayesPrism, Scaden, CIBERSORTx, MuSiC, DWLS, hspe, CPM, Bisque, and EPIC). Some methods are more robust in deconvolving mixtures with high tumour purity levels. Most methods tend to mis-predict normal epithelial for cancer epithelial as tumour purity increases, a finding that is validated in two independent datasets. The breast cancer molecular subtype influences this mis-prediction. BayesPrism and DWLS have the lowest combined numbers of false positives and false negatives, and have the best performance when deconvolving granular immune lineages. Our findings highlight the need for more single-cell characterisation of rarer cell types, and suggest that tumour cell compositions should be considered when deconvolving the TME.
Collapse
Affiliation(s)
- Khoa A Tran
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
| | - Venkateswar Addala
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Rebecca L Johnston
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - David Lovell
- School of Computer Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- QUT Centre for Data Science, Brisbane, QLD, 4000, Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Lambros T Koufariotis
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Scott Wood
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Sunny Z Wu
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Daniel Roden
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Ghamdan Al-Eryani
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Alexander Swarbrick
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Elizabeth D Williams
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, QLD, 4000, Australia
| | - John V Pearson
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Olga Kondrashova
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Nicola Waddell
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia.
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia.
| |
Collapse
|
5
|
Chiu Y, Ni C, Huang Y. Deconvolution of bulk gene expression profiles reveals the association between immune cell polarization and the prognosis of hepatocellular carcinoma patients. Cancer Med 2023; 12:15736-15760. [PMID: 37366298 PMCID: PMC10417088 DOI: 10.1002/cam4.6197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/02/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND Many studies have utilized computational methods, including cell composition deconvolution (CCD), to correlate immune cell polarizations with the survival of cancer patients, including those with hepatocellular carcinoma (HCC). However, currently available cell deconvolution estimated (CDE) tools do not cover the wide range of immune cell changes that are known to influence tumor progression. RESULTS A new CCD tool, HCCImm, was designed to estimate the abundance of tumor cells and 16 immune cell types in the bulk gene expression profiles of HCC samples. HCCImm was validated using real datasets derived from human peripheral blood mononuclear cells (PBMCs) and HCC tissue samples, demonstrating that HCCImm outperforms other CCD tools. We used HCCImm to analyze the bulk RNA-seq datasets of The Cancer Genome Atlas (TCGA)-liver hepatocellular carcinoma (LIHC) samples. We found that the proportions of memory CD8+ T cells and Tregs were negatively associated with patient overall survival (OS). Furthermore, the proportion of naïve CD8+ T cells was positively associated with patient OS. In addition, the TCGA-LIHC samples with a high tumor mutational burden had a significantly high abundance of nonmacrophage leukocytes. CONCLUSIONS HCCImm was equipped with a new set of reference gene expression profiles that allowed for a more robust analysis of HCC patient expression data. The source code is provided at https://github.com/holiday01/HCCImm.
Collapse
Affiliation(s)
- Yen‐Jung Chiu
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Department of Biomedical EngineeringMing Chuan UniversityTaoyuanTaiwan
| | - Chung‐En Ni
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Yen‐Hua Huang
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Center for Systems and Synthetic BiologyNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| |
Collapse
|
6
|
Balog S, Fujiwara R, Pan SQ, El-Baradie KB, Choi HY, Sinha S, Yang Q, Asahina K, Chen Y, Li M, Salomon M, Ng SWK, Tsukamoto H. Emergence of highly profibrotic and proinflammatory Lrat+Fbln2+ HSC subpopulation in alcoholic hepatitis. Hepatology 2023; 78:212-224. [PMID: 36181700 PMCID: PMC10977045 DOI: 10.1002/hep.32793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 09/03/2022] [Accepted: 09/10/2022] [Indexed: 01/18/2023]
Abstract
BACKGROUND AND AIMS Relative roles of HSCs and portal fibroblasts in alcoholic hepatitis (AH) are unknown. We aimed to identify subpopulations of collagen type 1 alpha 1 (Col1a1)-expressing cells in a mouse AH model by single-cell RNA sequencing (scRNA-seq) and filtering the cells with the HSC (lecithin retinol acyltransferase [Lrat]) and portal fibroblast (Thy-1 cell surface antigen [Thy1] and fibulin 2 [Fbln2]) markers and vitamin A (VitA) storage. APPROACH AND RESULTS Col1a1-green fluorescent protein (GFP) mice underwent AH, CCl 4 , and bile duct ligation (BDL) procedures to have comparable F1-F2 liver fibrosis. Col1a1-expressing cells were sorted via FACS by VitA autofluorescence and GFP for single-cell RNA sequencing. In AH, approximately 80% of Lrat+Thy1-Fbln2- activated HSCs were VitA-depleted (vs. ~13% in BDL and CCl 4 ). Supervised clustering identified a subset co-expressing Lrat and Fbln2 (Lrat+Fbln2+), which expanded 44-fold, 17-fold, and 1.3-fold in AH, BDL, and CCl 4 . Lrat+Fbln2+ cells had 3-15-times inductions of profibrotic, myofibroblastic, and immunoregulatory genes versus Lrat+Fbln2- cells, but 2-4-times repressed HSC-selective genes. AH activated HSCs had up-regulated inflammatory (chemokine [C-X-C motif] ligand 2 [Cxcl2], chemokine [C-C motif] ligand 2), antimicrobial (Il-33, Zc3h12a), and antigen presentation (H2-Q6, H2-T23) genes versus BDL and CCl 4 . Computational deconvolution of AH versus normal human bulk-liver RNA-sequencing data supported an expansion of LRAT+FBLN2+ cells in AH; AH patient liver immunohistochemistry showed FBLN2 staining along fibrotic septa enriched with LRAT+ cells; and in situ hybridization confirmed co-expression of FBLN2 with CXCL2 and/or human leukocyte antigen E in patient AH. Finally, HSC tracing in Lrat-Cre;Rosa26mTmG mice detected GFP+FBLN2+ cells in AH. CONCLUSION A highly profibrotic, inflammatory, and immunoregulatory Lrat+Fbln2+ subpopulation emerges from HSCs in AH and may contribute to the inflammatory and immunoreactive nature of AH.
Collapse
Affiliation(s)
- Steven Balog
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Reika Fujiwara
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
- University of Michigan, Ann Arbor, Michigan, USA
| | - Stephanie Q. Pan
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Khairat B. El-Baradie
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Hye Yeon Choi
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Sonal Sinha
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Qihong Yang
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Kinji Asahina
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
- Central Research Laboratory, Shiga University of Medical Sciences, Seta Tsukinowa-cho Otsu, Shiga, Japan
| | - Yibu Chen
- USC Libraries Bioinformatic Services of the University of Southern California, Los Angeles, California, USA
| | - Meng Li
- USC Libraries Bioinformatic Services of the University of Southern California, Los Angeles, California, USA
| | - Matthew Salomon
- Department Medicine, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Stanley W.-K. Ng
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, California, USA
| | - Hidekazu Tsukamoto
- Southern California Research Center for ALPD and Cirrhosis, Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
- University of Michigan, Ann Arbor, Michigan, USA
- Department of Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, California, USA
| |
Collapse
|
7
|
Alonso-Moreda N, Berral-González A, De La Rosa E, González-Velasco O, Sánchez-Santos JM, De Las Rivas J. Comparative Analysis of Cell Mixtures Deconvolution and Gene Signatures Generated for Blood, Immune and Cancer Cells. Int J Mol Sci 2023; 24:10765. [PMID: 37445946 DOI: 10.3390/ijms241310765] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
In the last two decades, many detailed full transcriptomic studies on complex biological samples have been published and included in large gene expression repositories. These studies primarily provide a bulk expression signal for each sample, including multiple cell-types mixed within the global signal. The cellular heterogeneity in these mixtures does not allow the activity of specific genes in specific cell types to be identified. Therefore, inferring relative cellular composition is a very powerful tool to achieve a more accurate molecular profiling of complex biological samples. In recent decades, computational techniques have been developed to solve this problem by applying deconvolution methods, designed to decompose cell mixtures into their cellular components and calculate the relative proportions of these elements. Some of them only calculate the cell proportions (supervised methods), while other deconvolution algorithms can also identify the gene signatures specific for each cell type (unsupervised methods). In these work, five deconvolution methods (CIBERSORT, FARDEEP, DECONICA, LINSEED and ABIS) were implemented and used to analyze blood and immune cells, and also cancer cells, in complex mixture samples (using three bulk expression datasets). Our study provides three analytical tools (corrplots, cell-signature plots and bar-mixture plots) that allow a thorough comparative analysis of the cell mixture data. The work indicates that CIBERSORT is a robust method optimized for the identification of immune cell-types, but not as efficient in the identification of cancer cells. We also found that LINSEED is a very powerful unsupervised method that provides precise and specific gene signatures for each of the main immune cell types tested: neutrophils and monocytes (of the myeloid lineage), B-cells, NK cells and T-cells (of the lymphoid lineage), and also for cancer cells.
Collapse
Affiliation(s)
- Natalia Alonso-Moreda
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Alberto Berral-González
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Enrique De La Rosa
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Oscar González-Velasco
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - José Manuel Sánchez-Santos
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
- Department of Statistics, University of Salamanca (USAL), 37008 Salamanca, Spain
| | - Javier De Las Rivas
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| |
Collapse
|
8
|
Li J, Li L, You P, Wei Y, Xu B. Towards artificial intelligence to multi-omics characterization of tumor heterogeneity in esophageal cancer. Semin Cancer Biol 2023; 91:35-49. [PMID: 36868394 DOI: 10.1016/j.semcancer.2023.02.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/21/2023] [Accepted: 02/28/2023] [Indexed: 03/05/2023]
Abstract
Esophageal cancer is a unique and complex heterogeneous malignancy, with substantial tumor heterogeneity: at the cellular levels, tumors are composed of tumor and stromal cellular components; at the genetic levels, they comprise genetically distinct tumor clones; at the phenotypic levels, cells in distinct microenvironmental niches acquire diverse phenotypic features. This heterogeneity affects almost every process of esophageal cancer progression from onset to metastases and recurrence, etc. Intertumoral and intratumoral heterogeneity are major obstacles in the treatment of esophageal cancer, but also offer the potential to manipulate the heterogeneity themselves as a new therapeutic strategy. The high-dimensional, multi-faceted characterization of genomics, epigenomics, transcriptomics, proteomics, metabonomics, etc. of esophageal cancer has opened novel horizons for dissecting tumor heterogeneity. Artificial intelligence especially machine learning and deep learning algorithms, are able to make decisive interpretations of data from multi-omics layers. To date, artificial intelligence has emerged as a promising computational tool for analyzing and dissecting esophageal patient-specific multi-omics data. This review provides a comprehensive review of tumor heterogeneity from a multi-omics perspective. Especially, we discuss the novel techniques single-cell sequencing and spatial transcriptomics, which have revolutionized our understanding of the cell compositions of esophageal cancer and allowed us to determine novel cell types. We focus on the latest advances in artificial intelligence in integrating multi-omics data of esophageal cancer. Artificial intelligence-based multi-omics data integration computational tools exert a key role in tumor heterogeneity assessment, which will potentially boost the development of precision oncology in esophageal cancer.
Collapse
Affiliation(s)
- Junyu Li
- Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China; Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Lin Li
- Department of Thoracic Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Peimeng You
- Nanchang University, Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Yiping Wei
- Department of Thoracic Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang 330006, Jiangxi, China.
| | - Bin Xu
- Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China.
| |
Collapse
|
9
|
Teh RQ, Liu GS, Wang JH. Bioinformatics Tools for Bulk Gene Expression Deconvolution in Diabetic Retinopathy. Methods Mol Biol 2023; 2678:107-115. [PMID: 37326707 DOI: 10.1007/978-1-0716-3255-0_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Retinal neovascularization is one of the leading causes of vision loss and a hallmark of proliferative diabetic retinopathy (PDR). The immune system is observed to be involved in the pathogenesis of diabetic retinopathy (DR). The specific immune cell type that contributes to retinal neovascularization can be identified via a bioinformatics analysis of RNA sequencing (RNA-seq) data, known as deconvolution analysis. Previous study has identified the infiltration of macrophages in the retina of rats with hypoxia-induced retinal neovascularization and patients with PDR through a deconvolution algorithm, known as CIBERSORTx. Here, we describe the protocols of using CIBERSORTx to perform the deconvolution analysis and downstream analysis of RNA-seq data.
Collapse
Affiliation(s)
- Ru Qi Teh
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia
| | - Guei-Sheung Liu
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia.
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia.
- Ophthalmology, Department of Surgery, University of Melbourne, East Melbourne, VIC, Australia.
| | - Jiang-Hui Wang
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia.
| |
Collapse
|
10
|
Tiwari A, Trivedi R, Lin SY. Tumor microenvironment: barrier or opportunity towards effective cancer therapy. J Biomed Sci 2022; 29:83. [PMID: 36253762 PMCID: PMC9575280 DOI: 10.1186/s12929-022-00866-3] [Citation(s) in RCA: 181] [Impact Index Per Article: 60.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/01/2022] [Indexed: 12/24/2022] Open
Abstract
Tumor microenvironment (TME) is a specialized ecosystem of host components, designed by tumor cells for successful development and metastasis of tumor. With the advent of 3D culture and advanced bioinformatic methodologies, it is now possible to study TME’s individual components and their interplay at higher resolution. Deeper understanding of the immune cell’s diversity, stromal constituents, repertoire profiling, neoantigen prediction of TMEs has provided the opportunity to explore the spatial and temporal regulation of immune therapeutic interventions. The variation of TME composition among patients plays an important role in determining responders and non-responders towards cancer immunotherapy. Therefore, there could be a possibility of reprogramming of TME components to overcome the widely prevailing issue of immunotherapeutic resistance. The focus of the present review is to understand the complexity of TME and comprehending future perspective of its components as potential therapeutic targets. The later part of the review describes the sophisticated 3D models emerging as valuable means to study TME components and an extensive account of advanced bioinformatic tools to profile TME components and predict neoantigens. Overall, this review provides a comprehensive account of the current knowledge available to target TME.
Collapse
Affiliation(s)
- Aadhya Tiwari
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Rakesh Trivedi
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shiaw-Yih Lin
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
11
|
Bronk JK, Kapadia C, Wu X, Chapman BV, Wang R, Karpinets TV, Song X, Futreal AM, Zhang J, Klopp AH, Colbert LE. Feasibility of a novel non-invasive swab technique for serial whole-exome sequencing of cervical tumors during chemoradiation therapy. PLoS One 2022; 17:e0274457. [PMID: 36201462 PMCID: PMC9536567 DOI: 10.1371/journal.pone.0274457] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 08/29/2022] [Indexed: 11/06/2022] Open
Abstract
Background Clinically relevant genetic predictors of radiation response for cervical cancer are understudied due to the morbidity of repeat invasive biopsies required to obtain genetic material. Thus, we aimed to demonstrate the feasibility of a novel noninvasive cervical swab technique to (1) collect tumor DNA with adequate throughput to (2) perform whole-exome sequencing (WES) at serial time points over the course of chemoradiation therapy (CRT). Methods Cervical cancer tumor samples from patients undergoing chemoradiation were collected at baseline, at week 1, week 3, and at the completion of CRT (week 5) using a noninvasive swab-based biopsy technique. Swab samples were analyzed with whole-exome sequencing (WES) with mutation calling using a custom pipeline optimized for shallow whole-exome sequencing with low tumor purity (TP). Tumor mutation changes over the course of treatment were profiled. Results 216 samples were collected and successfully sequenced for 70 patients (94% of total number of tumor samples collected). A total of 33 patients had a complete set of samples at all four time points. The mean mapping rate was 98% for all samples, and the mean target coverage was 180. Estimated TP was greater than 5% for all samples. Overall mutation frequency decreased during CRT but mapping rate and mean target coverage remained at >98% and >180 reads at week 5. Conclusion This study demonstrates the feasibility and application of a noninvasive swab-based technique for WES analysis which may be applied to investigate dynamic tumor mutational changes during treatment to identify novel genes which confer radiation resistance.
Collapse
Affiliation(s)
- Julianna K. Bronk
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Chiraag Kapadia
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xiaogang Wu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Bhavana V. Chapman
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Rui Wang
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Tatiana V. Karpinets
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Xingzhi Song
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Andrew M. Futreal
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Jianhua Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Ann H. Klopp
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail: (LEC); (AHK)
| | - Lauren E. Colbert
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail: (LEC); (AHK)
| |
Collapse
|
12
|
Chen D, Li S, Wang X. GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2022; 4:441-466. [PMID: 38250319 PMCID: PMC10798655 DOI: 10.3934/fods.2022013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.
Collapse
Affiliation(s)
- Duan Chen
- Department of Mathematics and Statistics School of Data Science University of North Carolina at Charlotte, USA
| | - Shaoyu Li
- Department of Mathematics and Statistics University of North Carolina at Charlotte, USA
| | - Xue Wang
- Department of Quantitative Health Sciences Mayo Clinic, Florida, 32224, USA
| |
Collapse
|
13
|
Zhang Y, Sun H, Mandava A, Aevermann BD, Kollmann TR, Scheuermann RH, Qiu X, Qian Y. FastMix: a versatile data integration pipeline for cell type-specific biomarker inference. Bioinformatics 2022; 38:4735-4744. [PMID: 36018232 PMCID: PMC9801972 DOI: 10.1093/bioinformatics/btac585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 08/18/2022] [Accepted: 08/25/2022] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Flow cytometry (FCM) and transcription profiling are the two widely used assays in translational immunology research. However, there is no data integration pipeline for analyzing these two types of assays together with experiment variables for biomarker inference. Current FCM data analysis mainly relies on subjective manual gating analysis, which is difficult to be directly integrated with other automated computational methods. Existing deconvolutional analysis of bulk transcriptomics relies on predefined marker genes in the transcriptomics data, which are unavailable for novel cell types and does not utilize the FCM data that provide canonical phenotypic definitions of the cell types. RESULTS We developed a novel analytics pipeline-FastMix-for computational immunology, which integrates flow cytometry, bulk transcriptomics and clinical covariates for identifying cell type-specific gene expression signatures and biomarker genes. FastMix addresses the 'large p, small n' problem in the gene expression and flow cytometry integration analysis via a linear mixed effects model (LMER) for both cross-sectional and longitudinal studies. Its novel moment-based estimator not only reduces bias in parameter estimation but also is more efficient than iterative optimization. The FastMix pipeline also includes a cutting-edge flow cytometry data analysis method-DAFi-for identifying cell populations of interest and their characteristics. Simulation studies showed that FastMix produced smaller type I/II errors than competing methods. Validation using real data of two vaccine studies showed that FastMix identified a consistent set of signature genes as in independent single-cell RNA-seq analysis, producing additional interesting findings. AVAILABILITY AND IMPLEMENTATION Source code of FastMix is publicly available at https://github.com/terrysun0302/FastMix. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Aishwarya Mandava
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Brian D Aevermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Tobias R Kollmann
- Systems Vaccinology, Telethon Kids Institute, Perth Children’s Hospital, University of Western Australia, Nedlands, WA 6009, Australia
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, USA,Department of Pathology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xing Qiu
- To whom correspondence should be addressed. or
| | - Yu Qian
- To whom correspondence should be addressed. or
| |
Collapse
|
14
|
Vorperian SK, Moufarrej MN, Quake SR. Cell types of origin of the cell-free transcriptome. Nat Biotechnol 2022; 40:855-861. [PMID: 35132263 PMCID: PMC9200634 DOI: 10.1038/s41587-021-01188-9] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 12/13/2021] [Indexed: 12/12/2022]
Abstract
Cell-free RNA from liquid biopsies can be analyzed to determine disease tissue of origin. We extend this concept to identify cell types of origin using the Tabula Sapiens transcriptomic cell atlas as well as individual tissue transcriptomic cell atlases in combination with the Human Protein Atlas RNA consensus dataset. We define cell type signature scores, which allow the inference of cell types that contribute to cell-free RNA for a variety of diseases.
Collapse
Affiliation(s)
- Sevahn K Vorperian
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
- ChEM-H, Stanford University, Stanford, CA, USA
| | - Mira N Moufarrej
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Stephen R Quake
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
- Department of Applied Physics, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
15
|
Comprehensive evaluation of deconvolution methods for human brain gene expression. Nat Commun 2022; 13:1358. [PMID: 35292647 PMCID: PMC8924248 DOI: 10.1038/s41467-022-28655-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 01/28/2022] [Indexed: 11/08/2022] Open
Abstract
Transcriptome deconvolution aims to estimate the cellular composition of an RNA sample from its gene expression data, which in turn can be used to correct for composition differences across samples. The human brain is unique in its transcriptomic diversity, and comprises a complex mixture of cell-types, including transcriptionally similar subtypes of neurons. Here, we carry out a comprehensive evaluation of deconvolution methods for human brain transcriptome data, and assess the tissue-specificity of our key observations by comparison with human pancreas and heart. We evaluate eight transcriptome deconvolution approaches and nine cell-type signatures, testing the accuracy of deconvolution using in silico mixtures of single-cell RNA-seq data, RNA mixtures, as well as nearly 2000 human brain samples. Our results identify the main factors that drive deconvolution accuracy for brain data, and highlight the importance of biological factors influencing cell-type signatures, such as brain region and in vitro cell culturing. Transcriptome deconvolution aims to estimate cellular composition based on gene expression data. Here the authors evaluate deconvolution methods for human brain transcriptome and conclude that partial deconvolution algorithms work best, but that appropriate cell-type signatures are also important.
Collapse
|
16
|
Bunis DG, Wang W, Vallvé-Juanico J, Houshdaran S, Sen S, Ben Soltane I, Kosti I, Vo KC, Irwin JC, Giudice LC, Sirota M. Whole-Tissue Deconvolution and scRNAseq Analysis Identify Altered Endometrial Cellular Compositions and Functionality Associated With Endometriosis. Front Immunol 2022; 12:788315. [PMID: 35069565 PMCID: PMC8766492 DOI: 10.3389/fimmu.2021.788315] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
The uterine lining (endometrium) exhibits a pro-inflammatory phenotype in women with endometriosis, resulting in pain, infertility, and poor pregnancy outcomes. The full complement of cell types contributing to this phenotype has yet to be identified, as most studies have focused on bulk tissue or select cell populations. Herein, through integrating whole-tissue deconvolution and single-cell RNAseq, we comprehensively characterized immune and nonimmune cell types in the endometrium of women with or without disease and their dynamic changes across the menstrual cycle. We designed metrics to evaluate specificity of deconvolution signatures that resulted in single-cell identification of 13 novel signatures for immune cell subtypes in healthy endometrium. Guided by statistical metrics, we identified contributions of endometrial epithelial, endothelial, plasmacytoid dendritic cells, classical dendritic cells, monocytes, macrophages, and granulocytes to the endometrial pro-inflammatory phenotype, underscoring roles for nonimmune as well as immune cells to the dysfunctionality of this tissue.
Collapse
Affiliation(s)
- Daniel G. Bunis
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Wanxin Wang
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Júlia Vallvé-Juanico
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Sahar Houshdaran
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Sushmita Sen
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Isam Ben Soltane
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Idit Kosti
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Kim Chi Vo
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Juan C. Irwin
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Linda C. Giudice
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Pediatrics, Division of Neonatology, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
17
|
Kalafati M, Kutmon M, Evelo CT, van der Kallen CJH, Schalkwijk CG, Stehouwer CDA, Consortium BIOS, Blaak EE, van Greevenbroek MMJ, Adriaens M. An interferon-related signature characterizes the whole blood transcriptome profile of insulin-resistant individuals—the CODAM study. GENES & NUTRITION 2021; 16:22. [PMID: 34886800 PMCID: PMC8903498 DOI: 10.1186/s12263-021-00702-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 11/16/2021] [Indexed: 12/03/2022]
Abstract
Background Worldwide, the prevalence of obesity and insulin resistance has grown dramatically. Gene expression profiling in blood represents a powerful means to explore disease pathogenesis, but the potential impact of inter-individual differences in a cell-type profile is not always taken into account. The objective of this project was to investigate the whole blood transcriptome profile of insulin-resistant as compared to insulin-sensitive individuals independent of inter-individual differences in white blood cell profile. Results We report a 3% higher relative amount of monocytes in the insulin-resistant individuals. Furthermore, independent of their white blood cell profile, insulin-resistant participants had (i) higher expression of interferon-stimulated genes and (ii) lower expression of genes involved in cellular differentiation and remodeling of the actin cytoskeleton. Conclusions We present an approach to investigate the whole blood transcriptome of insulin-resistant individuals, independent of their DNA methylation-derived white blood cell profile. An interferon-related signature characterizes the whole blood transcriptome profile of the insulin-resistant individuals, independent of their white blood cell profile. The observed signature indicates increased systemic inflammation possibly due to an innate immune response and whole-body insulin resistance, which can be a cause or a consequence of insulin resistance. Altered gene expression in specific organs may be reflected in whole blood; hence, our results may reflect obesity and/or insulin resistance-related organ dysfunction in the insulin-resistant individuals. Supplementary Information The online version contains supplementary material available at 10.1186/s12263-021-00702-7.
Collapse
|
18
|
Yang T, Alessandri-Haber N, Fury W, Schaner M, Breese R, LaCroix-Fralish M, Kim J, Adler C, Macdonald LE, Atwal GS, Bai Y. AdRoit is an accurate and robust method to infer complex transcriptome composition. Commun Biol 2021; 4:1218. [PMID: 34686758 PMCID: PMC8536787 DOI: 10.1038/s42003-021-02739-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 10/04/2021] [Indexed: 12/31/2022] Open
Abstract
Bulk RNA sequencing provides the opportunity to understand biology at the whole transcriptome level without the prohibitive cost of single cell profiling. Advances in spatial transcriptomics enable to dissect tissue organization and function by genome-wide gene expressions. However, the readout of both technologies is the overall gene expression across potentially many cell types without directly providing the information of cell type constitution. Although several in-silico approaches have been proposed to deconvolute RNA-Seq data composed of multiple cell types, many suffer a deterioration of performance in complex tissues. Here we present AdRoit, an accurate and robust method to infer the cell composition from transcriptome data of mixed cell types. AdRoit uses gene expression profiles obtained from single cell RNA sequencing as a reference. It employs an adaptive learning approach to alleviate the sequencing technique difference between the single cell and the bulk (or spatial) transcriptome data, enhancing cross-platform readout comparability. Our systematic benchmarking and applications, which include deconvoluting complex mixtures that encompass 30 cell types, demonstrate its preferable sensitivity and specificity compared to many existing methods as well as its utilities. In addition, AdRoit is computationally efficient and runs orders of magnitude faster than most methods.
Collapse
Affiliation(s)
- Tao Yang
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY, 10591, USA
| | | | - Wen Fury
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY, 10591, USA
| | | | - Robert Breese
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY, 10591, USA
| | | | - Jinrang Kim
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY, 10591, USA
| | | | | | | | - Yu Bai
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY, 10591, USA.
| |
Collapse
|
19
|
Li H, Huang Y, Sharma A, Ming W, Luo K, Gu Z, Sun X, Liu H. From Cellular Infiltration Assessment to a Functional Gene Set-Based Prognostic Model for Breast Cancer. Front Immunol 2021; 12:751530. [PMID: 34691065 PMCID: PMC8529968 DOI: 10.3389/fimmu.2021.751530] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 09/15/2021] [Indexed: 12/24/2022] Open
Abstract
Background Cancer heterogeneity is a major challenge in clinical practice, and to some extent, the varying combinations of different cell types and their cross-talk with tumor cells that modulate the tumor microenvironment (TME) are thought to be responsible. Despite recent methodological advances in cancer, a reliable and robust model that could effectively investigate heterogeneity with direct prognostic/diagnostic clinical application remained elusive. Results To investigate cancer heterogeneity, we took advantage of single-cell transcriptome data and constructed the first indication- and cell type-specific reference gene expression profile (RGEP) for breast cancer (BC) that can accurately predict the cellular infiltration. By utilizing the BC-specific RGEP combined with a proven deconvolution model (LinDeconSeq), we were able to determine the intrinsic gene expression of 15 cell types in BC tissues. Besides identifying significant differences in cellular proportions between molecular subtypes, we also evaluated the varying degree of immune cell infiltration (basal-like subtype: highest; Her2 subtype: lowest) across all available TCGA-BRCA cohorts. By converting the cellular proportions into functional gene sets, we further developed a 24 functional gene set-based prognostic model that can effectively discriminate the overall survival (P = 5.9 × 10-33, n = 1091, TCGA-BRCA cohort) and therapeutic response (chemotherapy and immunotherapy) (P = 6.5 × 10-3, n = 348, IMvigor210 cohort) in the tumor patients. Conclusions Herein, we have developed a highly reliable BC-RGEP that adequately annotates different cell types and estimates the cellular infiltration. Of importance, the functional gene set-based prognostic model that we have introduced here showed a great ability to screen patients based on their therapeutic response. On a broader perspective, we provide a perspective to generate similar models in other cancer types to identify shared factors that drives cancer heterogeneity.
Collapse
Affiliation(s)
- Huamei Li
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Yiting Huang
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Amit Sharma
- Department of Neurosurgery, Center for Integrated Oncology (CIO), University Hospital Bonn, Bonn, Germany
- Department of Integrated Oncology, Center for Integrated Oncology (CIO), University Hospital Bonn, Bonn, Germany
| | - Wenglong Ming
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Kun Luo
- Department of Neurosurgery, Xinjiang Evidence-Based Medicine Research Institute, First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Zhongze Gu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
20
|
Ng SWK, Rouhani FJ, Brunner SF, Brzozowska N, Aitken SJ, Yang M, Abascal F, Moore L, Nikitopoulou E, Chappell L, Leongamornlert D, Ivovic A, Robinson P, Butler T, Sanders MA, Williams N, Coorens THH, Teague J, Raine K, Butler AP, Hooks Y, Wilson B, Birtchnell N, Naylor H, Davies SE, Stratton MR, Martincorena I, Rahbari R, Frezza C, Hoare M, Campbell PJ. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 2021; 598:473-478. [PMID: 34646017 DOI: 10.1038/s41586-021-03974-6] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 08/31/2021] [Indexed: 02/08/2023]
Abstract
The progression of chronic liver disease to hepatocellular carcinoma is caused by the acquisition of somatic mutations that affect 20-30 cancer genes1-8. Burdens of somatic mutations are higher and clonal expansions larger in chronic liver disease9-13 than in normal liver13-16, which enables positive selection to shape the genomic landscape9-13. Here we analysed somatic mutations from 1,590 genomes across 34 liver samples, including healthy controls, alcohol-related liver disease and non-alcoholic fatty liver disease. Seven of the 29 patients with liver disease had mutations in FOXO1, the major transcription factor in insulin signalling. These mutations affected a single hotspot within the gene, impairing the insulin-mediated nuclear export of FOXO1. Notably, six of the seven patients with FOXO1S22W hotspot mutations showed convergent evolution, with variants acquired independently by up to nine distinct hepatocyte clones per patient. CIDEB, which regulates lipid droplet metabolism in hepatocytes17-19, and GPAM, which produces storage triacylglycerol from free fatty acids20,21, also had a significant excess of mutations. We again observed frequent convergent evolution: up to fourteen independent clones per patient with CIDEB mutations and up to seven clones per patient with GPAM mutations. Mutations in metabolism genes were distributed across multiple anatomical segments of the liver, increased clone size and were seen in both alcohol-related liver disease and non-alcoholic fatty liver disease, but rarely in hepatocellular carcinoma. Master regulators of metabolic pathways are a frequent target of convergent somatic mutation in alcohol-related and non-alcoholic fatty liver disease.
Collapse
Affiliation(s)
- Stanley W K Ng
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Foad J Rouhani
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
- Department of Surgery, Addenbrooke's Hospital, Cambridge, UK
| | - Simon F Brunner
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | | | - Sarah J Aitken
- CRUK Cambridge Institute, Cambridge, UK
- Department of Pathology, Addenbrooke's Hospital, Cambridge, UK
- MRC Toxicology Unit, University of Cambridge, Cambridge, UK
| | - Ming Yang
- MRC Cancer Unit, University of Cambridge, Cambridge, UK
| | | | - Luiza Moore
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | | | - Lia Chappell
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | | | | | - Philip Robinson
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Timothy Butler
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Mathijs A Sanders
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | - Tim H H Coorens
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Jon Teague
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Keiran Raine
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Adam P Butler
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Yvette Hooks
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | - Beverley Wilson
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | | | - Huw Naylor
- Department of Surgery, Addenbrooke's Hospital, Cambridge, UK
| | - Susan E Davies
- Department of Pathology, Addenbrooke's Hospital, Cambridge, UK
| | | | | | - Raheleh Rahbari
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK
| | | | - Matthew Hoare
- CRUK Cambridge Institute, Cambridge, UK.
- Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.
| | - Peter J Campbell
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, UK.
- Stem Cell Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
21
|
Zhang W, Xu H, Qiao R, Zhong B, Zhang X, Gu J, Zhang X, Wei L, Wang X. ARIC: accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data. Brief Bioinform 2021; 23:6361035. [PMID: 34472588 DOI: 10.1093/bib/bbab362] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 11/12/2022] Open
Abstract
Quantifying cell proportions, especially for rare cell types in some scenarios, is of great value in tracking signals associated with certain phenotypes or diseases. Although some methods have been proposed to infer cell proportions from multicomponent bulk data, they are substantially less effective for estimating the proportions of rare cell types which are highly sensitive to feature outliers and collinearity. Here we proposed a new deconvolution algorithm named ARIC to estimate cell type proportions from gene expression or DNA methylation data. ARIC employs a novel two-step marker selection strategy, including collinear feature elimination based on the component-wise condition number and adaptive removal of outlier markers. This strategy can systematically obtain effective markers for weighted $\upsilon$-support vector regression to ensure a robust and precise rare proportion prediction. We showed that ARIC can accurately estimate fractions in both DNA methylation and gene expression data from different experiments. We further applied ARIC to the survival prediction of ovarian cancer and the condition monitoring of chronic kidney disease, and the results demonstrate the high accuracy and robustness as well as clinical potentials of ARIC. Taken together, ARIC is a promising tool to solve the deconvolution problem of bulk data where rare components are of vital importance.
Collapse
Affiliation(s)
- Wei Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Hanwen Xu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rong Qiao
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Bixi Zhong
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xianglin Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
22
|
Xu Y, Su GH, Ma D, Xiao Y, Shao ZM, Jiang YZ. Technological advances in cancer immunity: from immunogenomics to single-cell analysis and artificial intelligence. Signal Transduct Target Ther 2021; 6:312. [PMID: 34417437 PMCID: PMC8377461 DOI: 10.1038/s41392-021-00729-7] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 07/06/2021] [Accepted: 07/18/2021] [Indexed: 02/07/2023] Open
Abstract
Immunotherapies play critical roles in cancer treatment. However, given that only a few patients respond to immune checkpoint blockades and other immunotherapeutic strategies, more novel technologies are needed to decipher the complicated interplay between tumor cells and the components of the tumor immune microenvironment (TIME). Tumor immunomics refers to the integrated study of the TIME using immunogenomics, immunoproteomics, immune-bioinformatics, and other multi-omics data reflecting the immune states of tumors, which has relied on the rapid development of next-generation sequencing. High-throughput genomic and transcriptomic data may be utilized for calculating the abundance of immune cells and predicting tumor antigens, referring to immunogenomics. However, as bulk sequencing represents the average characteristics of a heterogeneous cell population, it fails to distinguish distinct cell subtypes. Single-cell-based technologies enable better dissection of the TIME through precise immune cell subpopulation and spatial architecture investigations. In addition, radiomics and digital pathology-based deep learning models largely contribute to research on cancer immunity. These artificial intelligence technologies have performed well in predicting response to immunotherapy, with profound significance in cancer therapy. In this review, we briefly summarize conventional and state-of-the-art technologies in the field of immunogenomics, single-cell and artificial intelligence, and present prospects for future research.
Collapse
Affiliation(s)
- Ying Xu
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Guan-Hua Su
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ding Ma
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Yi Xiao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| | - Zhi-Ming Shao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
- Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
| | - Yi-Zhou Jiang
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| |
Collapse
|
23
|
|
24
|
Kang K, Huang C, Li Y, Umbach DM, Li L. CDSeqR: fast complete deconvolution for gene expression data from bulk tissues. BMC Bioinformatics 2021; 22:262. [PMID: 34030626 PMCID: PMC8142515 DOI: 10.1186/s12859-021-04186-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 05/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA.
| | - Caizhi Huang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - David M Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA.
| |
Collapse
|
25
|
Ma J, Tran G, Wan AMD, Young EWK, Kumacheva E, Iscove NN, Zandstra PW. Microdroplet-based one-step RT-PCR for ultrahigh throughput single-cell multiplex gene expression analysis and rare cell detection. Sci Rep 2021; 11:6777. [PMID: 33762663 PMCID: PMC7990930 DOI: 10.1038/s41598-021-86087-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 03/10/2021] [Indexed: 01/31/2023] Open
Abstract
Gene expression analysis of individual cells enables characterization of heterogeneous and rare cell populations, yet widespread implementation of existing single-cell gene analysis techniques has been hindered due to limitations in scale, ease, and cost. Here, we present a novel microdroplet-based, one-step reverse-transcriptase polymerase chain reaction (RT-PCR) platform and demonstrate the detection of three targets simultaneously in over 100,000 single cells in a single experiment with a rapid read-out. Our customized reagent cocktail incorporates the bacteriophage T7 gene 2.5 protein to overcome cell lysate-mediated inhibition and allows for one-step RT-PCR of single cells encapsulated in nanoliter droplets. Fluorescent signals indicative of gene expressions are analyzed using a probabilistic deconvolution method to account for ambient RNA and cell doublets and produce single-cell gene signature profiles, as well as predict cell frequencies within heterogeneous samples. We also developed a simulation model to guide experimental design and optimize the accuracy and precision of the assay. Using mixtures of in vitro transcripts and murine cell lines, we demonstrated the detection of single RNA molecules and rare cell populations at a frequency of 0.1%. This low cost, sensitive, and adaptable technique will provide an accessible platform for high throughput single-cell analysis and enable a wide range of research and clinical applications.
Collapse
Affiliation(s)
- Jennifer Ma
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, M5S 3G9, Canada
| | - Gary Tran
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Alwin M D Wan
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, M5S 3G8, Canada
| | - Edmond W K Young
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, M5S 3G9, Canada
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, M5S 3G8, Canada
| | - Eugenia Kumacheva
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, M5S 3G9, Canada
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, M5S 3G8, Canada
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
| | - Norman N Iscove
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, M5G 1L7, Canada
| | - Peter W Zandstra
- School of Biomedical Engineering, University of British Columbia, 2222 Health Sciences Mall, Vancouver, BC, V6T 1Z3, Canada.
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
26
|
Hunt GJ, Gagnon-Bartsch JA. The role of scale in the estimation of cell-type proportions. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Mer AS, Heath EM, Madani Tonekaboni SA, Dogan-Artun N, Nair SK, Murison A, Garcia-Prat L, Shlush L, Hurren R, Voisin V, Bader GD, Nislow C, Rantalainen M, Lehmann S, Gower M, Guidos CJ, Lupien M, Dick JE, Minden MD, Schimmer AD, Haibe-Kains B. Biological and therapeutic implications of a unique subtype of NPM1 mutated AML. Nat Commun 2021; 12:1054. [PMID: 33594052 PMCID: PMC7886883 DOI: 10.1038/s41467-021-21233-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 01/15/2021] [Indexed: 01/29/2023] Open
Abstract
In acute myeloid leukemia (AML), molecular heterogeneity across patients constitutes a major challenge for prognosis and therapy. AML with NPM1 mutation is a distinct genetic entity in the revised World Health Organization classification. However, differing patterns of co-mutation and response to therapy within this group necessitate further stratification. Here we report two distinct subtypes within NPM1 mutated AML patients, which we label as primitive and committed based on the respective presence or absence of a stem cell signature. Using gene expression (RNA-seq), epigenomic (ATAC-seq) and immunophenotyping (CyToF) analysis, we associate each subtype with specific molecular characteristics, disease differentiation state and patient survival. Using ex vivo drug sensitivity profiling, we show a differential drug response of the subtypes to specific kinase inhibitors, irrespective of the FLT3-ITD status. Differential drug responses of the primitive and committed subtype are validated in an independent AML cohort. Our results highlight heterogeneity among NPM1 mutated AML patient samples based on stemness and suggest that the addition of kinase inhibitors to the treatment of cases with the primitive signature, lacking FLT3-ITD, could have therapeutic benefit.
Collapse
Affiliation(s)
- Arvind Singh Mer
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Karolinska Institute, Stockholm, Sweden
| | - Emily M Heath
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Seyed Ali Madani Tonekaboni
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Nergiz Dogan-Artun
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | | | - Alex Murison
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Laura Garcia-Prat
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Liran Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Rose Hurren
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | | | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Corey Nislow
- Faculty of Pharmaceutical Sciences, The University of British Columbia, Vancouver, Canada
| | | | | | - Mark Gower
- The Hospital for Sick Children, Toronto, ON, Canada
| | | | - Mathieu Lupien
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - John E Dick
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Aaron D Schimmer
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Ontario Institute for Cancer Research, Toronto, ON, Canada.
- Vector Institute, Toronto, ON, Canada.
| |
Collapse
|
28
|
Data-driven detection of subtype-specific differentially expressed genes. Sci Rep 2021; 11:332. [PMID: 33432005 PMCID: PMC7801594 DOI: 10.1038/s41598-020-79704-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 12/11/2020] [Indexed: 11/08/2022] Open
Abstract
Among multiple subtypes of tissue or cell, subtype-specific differentially-expressed genes (SDEGs) are defined as being most-upregulated in only one subtype but not in any other. Detecting SDEGs plays a critical role in the molecular characterization and deconvolution of multicellular complex tissues. Classic differential analysis assumes a null hypothesis whose test statistic is not subtype-specific, thus can produce a high false positive rate and/or lower detection power. Here we first introduce a One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. We then propose a scaled test statistic (OVE-sFC) for assessing the statistical significance of SDEGs that applies a mixture null distribution model and a tailored permutation test. The OVE-FC/sFC test was validated on both type 1 error rate and detection power using extensive simulation data sets generated from real gene expression profiles of purified subtype samples. The OVE-FC/sFC test was then applied to two benchmark gene expression data sets of purified subtype samples and detected many known or previously unknown SDEGs. Subsequent supervised deconvolution results on synthesized bulk expression data, obtained using the SDEGs detected from the independent purified expression data by the OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.
Collapse
|
29
|
Lee D, Park Y, Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform 2020; 22:5896573. [PMID: 34020548 DOI: 10.1093/bib/bbaa188] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/29/2020] [Accepted: 07/21/2020] [Indexed: 12/19/2022] Open
Abstract
The multi-omics molecular characterization of cancer opened a new horizon for our understanding of cancer biology and therapeutic strategies. However, a tumor biopsy comprises diverse types of cells limited not only to cancerous cells but also to tumor microenvironmental cells and adjacent normal cells. This heterogeneity is a major confounding factor that hampers a robust and reproducible bioinformatic analysis for biomarker identification using multi-omics profiles. Besides, the heterogeneity itself has been recognized over the years for its significant prognostic values in some cancer types, thus offering another promising avenue for therapeutic intervention. A number of computational approaches to unravel such heterogeneity from high-throughput molecular profiles of a tumor sample have been proposed, but most of them rely on the data from an individual omics layer. Since the heterogeneity of cells is widely distributed across multi-omics layers, methods based on an individual layer can only partially characterize the heterogeneous admixture of cells. To help facilitate further development of the methodologies that synchronously account for several multi-omics profiles, we wrote a comprehensive review of diverse approaches to characterize tumor heterogeneity based on three different omics layers: genome, epigenome and transcriptome. As a result, this review can be useful for the analysis of multi-omics profiles produced by many large-scale consortia. Contact:sunkim.bioinfo@snu.ac.kr.
Collapse
Affiliation(s)
- Dohoon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Youngjune Park
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
30
|
Du R, Carey V, Weiss ST. deconvSeq: deconvolution of cell mixture distribution in sequencing data. Bioinformatics 2020; 35:5095-5102. [PMID: 31147676 DOI: 10.1093/bioinformatics/btz444] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 05/17/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Although single-cell sequencing is becoming more widely available, many tissue samples such as intracranial aneurysms are both fibrous and minute, and therefore not easily dissociated into single cells. To account for the cell type heterogeneity in such tissues therefore requires a computational method. We present a computational deconvolution method, deconvSeq, for sequencing data (RNA and bisulfite) obtained from bulk tissue. This method can also be applied to single-cell RNA sequencing data. RESULTS DeconvSeq utilizes a generalized linear model to model effects of tissue type on feature quantification, which is specific to the data structure of the sequencing type used. Estimated model coefficients can then be used to predict the cell type mixture within a tissue. Predicted cell type mixtures were validated against actual cell counts in whole blood samples. Using this method, we obtained a mean correlation of 0.998 (95% CI 0.995-0.999) from the RNA sequencing data of 35 whole blood samples and 0.95 (95% CI 0.91-0.98) from the reduced representation bisulfite sequencing data from 35 whole blood samples. Using symmetric balances to obtain the correlation between compositional parts, we found that the lowest correlation occurred for monocytes for both RNA and bisulfite sequencing. Comparison with other methods of decomposition such as deconRNAseq, CIBERSORT, MuSiC and EpiDISH showed that deconvSeq is able to achieve good prediction using mean correlation with far fewer genes or CpG sites in the signature set. AVAILABILITY AND IMPLEMENTATION Software implementing deconvSeq is available at https://github.com/rosedu1/deconvSeq. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rose Du
- Department of Neurosurgery, Boston, MA, USA.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Vince Carey
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
31
|
Bortolomeazzi M, Keddar MR, Ciccarelli FD, Benedetti L. Identification of non-cancer cells from cancer transcriptomic data. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194445. [PMID: 31654804 PMCID: PMC7346884 DOI: 10.1016/j.bbagrm.2019.194445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/20/2019] [Accepted: 10/07/2019] [Indexed: 02/07/2023]
Abstract
Interactions between cancer cells and non-cancer cells composing the tumour microenvironment play a primary role in determining cancer progression and shaping the response to therapy. The qualitative and quantitative characterisation of the different cell populations in the tumour microenvironment is therefore crucial to understand its role in cancer. In recent years, many experimental and computational approaches have been developed to identify the cell populations composing heterogeneous tissue samples, such as cancer. In this review, we describe the state-of-the-art approaches for the quantification of non-cancer cells from bulk and single-cell cancer transcriptomic data, with a focus on immune cells. We illustrate the main features of these approaches and highlight their applications for the analysis of the tumour microenvironment in solid cancers. We also discuss techniques that are complementary and alternative to RNA sequencing, particularly focusing on approaches that can provide spatial information on the distribution of the cells within the tumour in addition to their qualitative and quantitative measurements. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Michele Bortolomeazzi
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK
| | - Mohamed Reda Keddar
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK
| | - Francesca D Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK.
| | - Lorena Benedetti
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK.
| |
Collapse
|
32
|
Li H, Sharma A, Luo K, Qin ZS, Sun X, Liu H. DeconPeaker, a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples. Front Genet 2020; 11:392. [PMID: 32547592 PMCID: PMC7269180 DOI: 10.3389/fgene.2020.00392] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/30/2020] [Indexed: 12/26/2022] Open
Abstract
While our understanding of cellular and molecular processes has grown exponentially, issues related to the cell microenvironment and cellular heterogeneity have sparked a new debate concerning the cell identity. Cell composition (chromatin and nuclear architecture) poses a strong risk for dynamic changes in the diseased condition. Since chromatin accessibility patterns play a major role in human diseases, it is therefore anticipated that a deconvolution tool based on open chromatin data will provide better performance in identifying cell composition. Herein, we have designed the deconvolution tool "DeconPeaker," which can precisely define the uniqueness among subpopulations of cells using open chromatin datasets. Using this tool, we simultaneously evaluated chromatin accessibility and gene expression datasets to estimate cell types and their respective proportions in a mixture of samples. In comparison to other known deconvolution methods, we observed the lowest average root-mean-square error (RMSE = 0.042) and the highest average correlation coefficient (r = 0.919) between the prediction and "true" proportion. As a proof-of-concept, we also tested chromatin accessibility data from acute myeloid leukemia (AML) and successfully obtained unique cell types associated with AML progression. Furthermore, we showed that chromatin accessibility represents more essential characteristics in the identification of cell types than gene expression. Taken together, DeconPeaker as a powerful tool has the potential to combine different datasets (primarily, chromatin accessibility and gene expression) and define different cell types in mixtures. The Python package of DeconPeaker is now available at https://github.com/lihuamei/DeconPeaker.
Collapse
Affiliation(s)
- Huamei Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Amit Sharma
- Department of Ophthalmology, University Hospital Bonn, Bonn, Germany
| | - Kun Luo
- Department of Neurosurgery, Xinjiang Evidence-Based Medicine Research Institute, First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
33
|
Dong L, Kollipara A, Darville T, Zou F, Zheng X. Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information. Sci Rep 2020; 10:5434. [PMID: 32214192 PMCID: PMC7096458 DOI: 10.1038/s41598-020-62330-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 02/26/2020] [Indexed: 01/03/2023] Open
Abstract
Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM.
Collapse
Affiliation(s)
- Li Dong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Avinash Kollipara
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| |
Collapse
|
34
|
Görtler F, Schön M, Simeth J, Solbrig S, Wettig T, Oefner PJ, Spang R, Altenbuchinger M. Loss-Function Learning for Digital Tissue Deconvolution. J Comput Biol 2020; 27:342-355. [DOI: 10.1089/cmb.2019.0462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Franziska Görtler
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Marian Schön
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Jakob Simeth
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Peter J. Oefner
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Michael Altenbuchinger
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| |
Collapse
|
35
|
Yoosuf N, Navarro JF, Salmén F, Ståhl PL, Daub CO. Identification and transfer of spatial transcriptomics signatures for cancer diagnosis. Breast Cancer Res 2020; 22:6. [PMID: 31931856 PMCID: PMC6958738 DOI: 10.1186/s13058-019-1242-9] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 12/27/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Distinguishing ductal carcinoma in situ (DCIS) from invasive ductal carcinoma (IDC) regions in clinical biopsies constitutes a diagnostic challenge. Spatial transcriptomics (ST) is an in situ capturing method, which allows quantification and visualization of transcriptomes in individual tissue sections. In the past, studies have shown that breast cancer samples can be used to study their transcriptomes with spatial resolution in individual tissue sections. Previously, supervised machine learning methods were used in clinical studies to predict the clinical outcomes for cancer types. METHODS We used four publicly available ST breast cancer datasets from breast tissue sections annotated by pathologists as non-malignant, DCIS, or IDC. We trained and tested a machine learning method (support vector machine) based on the expert annotation as well as based on automatic selection of cell types by their transcriptome profiles. RESULTS We identified expression signatures for expert annotated regions (non-malignant, DCIS, and IDC) and build machine learning models. Classification results for 798 expression signature transcripts showed high coincidence with the expert pathologist annotation for DCIS (100%) and IDC (96%). Extending our analysis to include all 25,179 expressed transcripts resulted in an accuracy of 99% for DCIS and 98% for IDC. Further, classification based on an automatically identified expression signature covering all ST spots of tissue sections resulted in prediction accuracy of 95% for DCIS and 91% for IDC. CONCLUSIONS This concept study suggest that the ST signatures learned from expert selected breast cancer tissue sections can be used to identify breast cancer regions in whole tissue sections including regions not trained on. Furthermore, the identified expression signatures can classify cancer regions in tissue sections not used for training with high accuracy. Expert-generated but even automatically generated cancer signatures from ST data might be able to classify breast cancer regions and provide clinical decision support for pathologists in the future.
Collapse
Affiliation(s)
- Niyaz Yoosuf
- Department of Biosciences and Nutrition, Karolinska Institutet, 141 83, Huddinge, Sweden. .,Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - José Fernández Navarro
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Fredrik Salmén
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.,Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences) and University Medical Center Utrecht, Cancer Genomics Netherlands, Utrecht, the Netherlands
| | - Patrik L Ståhl
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Carsten O Daub
- Department of Biosciences and Nutrition, Karolinska Institutet, 141 83, Huddinge, Sweden.
| |
Collapse
|
36
|
Steen CB, Liu CL, Alizadeh AA, Newman AM. Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. Methods Mol Biol 2020; 2117:135-157. [PMID: 31960376 DOI: 10.1007/978-1-0716-0301-7_7] [Citation(s) in RCA: 298] [Impact Index Per Article: 59.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
CIBERSORTx is a suite of machine learning tools for the assessment of cellular abundance and cell type-specific gene expression patterns from bulk tissue transcriptome profiles. With this framework, single-cell or bulk-sorted RNA sequencing data can be used to learn molecular signatures of distinct cell types from a small collection of biospecimens. These signatures can then be repeatedly applied to characterize cellular heterogeneity from bulk tissue transcriptomes without physical cell isolation. In this chapter, we provide a detailed primer on CIBERSORTx and demonstrate its capabilities for high-throughput profiling of cell types and cellular states in normal and neoplastic tissues.
Collapse
Affiliation(s)
- Chloé B Steen
- Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Chih Long Liu
- Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA.,Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
| | - Ash A Alizadeh
- Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA. .,Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA. .,Center for Cancer Systems Biology, Stanford University, Stanford, CA, USA. .,Division of Hematology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
| | - Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA. .,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
37
|
Chiu YJ, Hsieh YH, Huang YH. Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells. BMC Med Genomics 2019; 12:169. [PMID: 31856824 PMCID: PMC6923925 DOI: 10.1186/s12920-019-0613-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Accepted: 10/31/2019] [Indexed: 01/07/2023] Open
Abstract
Background To facilitate the investigation of the pathogenic roles played by various immune cells in complex tissues such as tumors, a few computational methods for deconvoluting bulk gene expression profiles to predict cell composition have been created. However, available methods were usually developed along with a set of reference gene expression profiles consisting of imbalanced replicates across different cell types. Therefore, the objective of this study was to create a new deconvolution method equipped with a new set of reference gene expression profiles that incorporate more microarray replicates of the immune cells that have been frequently implicated in the poor prognosis of cancers, such as T helper cells, regulatory T cells and macrophage M1/M2 cells. Methods Our deconvolution method was developed by choosing ε-support vector regression (ε-SVR) as the core algorithm assigned with a loss function subject to the L1-norm penalty. To construct the reference gene expression signature matrix for regression, a subset of differentially expressed genes were chosen from 148 microarray-based gene expression profiles for 9 types of immune cells by using ANOVA and minimizing condition number. Agreement analyses including mean absolute percentage errors and Bland-Altman plots were carried out to compare the performances of our method and CIBERSORT. Results In silico cell mixtures, simulated bulk tissues, and real human samples with known immune-cell fractions were used as the test datasets for benchmarking. Our method outperformed CIBERSORT in the benchmarks using in silico breast tissue-immune cell mixtures in the proportions of 30:70 and 50:50, and in the benchmark using 164 human PBMC samples. Our results suggest that the performance of our method was at least comparable to that of a state-of-the-art tool, CIBERSORT. Conclusions We developed a new cell composition deconvolution method and the implementation was entirely based on the publicly available R and Python packages. In addition, we compiled a new set of reference gene expression profiles, which might allow for a more robust prediction of the immune cell fractions from the expression profiles of cell mixtures. The source code of our method could be downloaded from https://github.com/holiday01/deconvolution-to-estimate-immune-cell-subsets.
Collapse
Affiliation(s)
- Yen-Jung Chiu
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan
| | - Yi-Hsuan Hsieh
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan
| | - Yen-Hua Huang
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan. .,Centre for Systems and Synthetic Biology, National Yang-Ming University, Taipei, 11221, Taiwan.
| |
Collapse
|
38
|
Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019; 15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open
Abstract
Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq. Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| | - Qian Meng
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - David M. Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Melissa Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| |
Collapse
|
39
|
Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 2019; 34:1969-1979. [PMID: 29351586 DOI: 10.1093/bioinformatics/bty019] [Citation(s) in RCA: 145] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open
Abstract
Summary Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. In this review, we highlight the importance and value of computational deconvolution methods to infer the abundance of different cell types and/or cell type-specific expression profiles in heterogeneous samples without performing physical cell sorting. We also explain the various deconvolution scenarios, the mathematical approaches used to solve them and the effect of data processing and different confounding factors on the accuracy of the deconvolution results. Contact katleen.depreter@ugent.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francisco Avila Cobos
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Katleen De Preter
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| |
Collapse
|
40
|
Domanskyi S, Szedlak A, Hawkins NT, Wang J, Paternostro G, Piermarocchi C. Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters. BMC Bioinformatics 2019; 20:369. [PMID: 31262249 PMCID: PMC6604348 DOI: 10.1186/s12859-019-2951-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 06/13/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single cell RNA sequencing (scRNA-seq) brings unprecedented opportunities for mapping the heterogeneity of complex cellular environments such as bone marrow, and provides insight into many cellular processes. Single cell RNA-seq has a far larger fraction of missing data reported as zeros (dropouts) than traditional bulk RNA-seq, and unsupervised clustering combined with Principal Component Analysis (PCA) can be used to overcome this limitation. After clustering, however, one has to interpret the average expression of markers on each cluster to identify the corresponding cell types, and this is normally done by hand by an expert curator. RESULTS We present a computational tool for processing single cell RNA-seq data that uses a voting algorithm to automatically identify cells based on approval votes received by known molecular markers. Using a stochastic procedure that accounts for imbalances in the number of known molecular signatures for different cell types, the method computes the statistical significance of the final approval score and automatically assigns a cell type to clusters without an expert curator. We demonstrate the utility of the tool in the analysis of eight samples of bone marrow from the Human Cell Atlas. The tool provides a systematic identification of cell types in bone marrow based on a list of markers of immune cell types, and incorporates a suite of visualization tools that can be overlaid on a t-SNE representation. The software is freely available as a Python package at https://github.com/sdomanskyi/DigitalCellSorter . CONCLUSIONS This methodology assures that extensive marker to cell type matching information is taken into account in a systematic way when assigning cell clusters to cell types. Moreover, the method allows for a high throughput processing of multiple scRNA-seq datasets, since it does not involve an expert curator, and it can be applied recursively to obtain cell sub-types. The software is designed to allow the user to substitute the marker to cell type matching information and apply the methodology to different cellular environments.
Collapse
Affiliation(s)
- Sergii Domanskyi
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA.
| | - Anthony Szedlak
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
| | - Nathaniel T Hawkins
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
| | | | | | - Carlo Piermarocchi
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
41
|
Boufaied N, Takhar M, Nash C, Erho N, Bismar TA, Davicioni E, Thomson AA. Development of a predictive model for stromal content in prostate cancer samples to improve signature performance. J Pathol 2019; 249:411-424. [PMID: 31206668 PMCID: PMC6900085 DOI: 10.1002/path.5315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 05/27/2019] [Accepted: 06/13/2019] [Indexed: 01/23/2023]
Abstract
Prostate cancer is heterogeneous in both cellular composition and patient outcome, and development of biomarker signatures to distinguish indolent from aggressive tumours is a high priority. Stroma plays an important role during prostate cancer progression and undergoes histological and transcriptional changes associated with disease. However, identification and validation of stromal markers is limited by a lack of datasets with defined stromal/tumour ratio. We have developed a prostate‐selective signature to estimate the stromal content in cancer samples of mixed cellular composition. We identified stromal‐specific markers from transcriptomic datasets of developmental prostate mesenchyme and prostate cancer stroma. These were experimentally validated in cell lines, datasets of known stromal content, and by immunohistochemistry in tissue samples to verify stromal‐specific expression. Linear models based upon six transcripts were able to infer the stromal content and estimate stromal composition in mixed tissues. The best model had a coefficient of determination R2 of 0.67. Application of our stromal content estimation model in various prostate cancer datasets led to improved performance of stromal predictive signatures for disease progression and metastasis. The stromal content of prostate tumours varies considerably; consequently, deconvolution of stromal proportion may yield better results than tumour cell deconvolution. We suggest that adjusting expression data for cell composition will improve stromal signature performance and lead to better prognosis and stratification of men with prostate cancer. © 2019 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Nadia Boufaied
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| | - Mandeep Takhar
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Claire Nash
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| | - Nicholas Erho
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Tarek A Bismar
- Department of Pathology and Laboratory Medicine, University of Calgary Cumming School of Medicine, Calgary, Canada.,Department of Oncology, Biochemistry and Molecular Biology, University of Calgary Cumming School of Medicine, Calgary, Canada
| | - Elai Davicioni
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Axel A Thomson
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| |
Collapse
|
42
|
Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat Commun 2019; 10:2209. [PMID: 31101809 PMCID: PMC6525259 DOI: 10.1038/s41467-019-09990-5] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 04/11/2019] [Indexed: 11/08/2022] Open
Abstract
Changes in bulk transcriptional profiles of heterogeneous samples often reflect changes in proportions of individual cell types. Several robust techniques have been developed to dissect the composition of such mixed samples given transcriptional signatures of the pure components or their proportions. These approaches are insufficient, however, in situations when no information about individual mixture components is available. This problem is known as the complete deconvolution problem, where the composition is revealed without any a priori knowledge about cell types and their proportions. Here, we identify a previously unrecognized property of tissue-specific genes - their mutual linearity - and use it to reveal the structure of the topological space of mixed transcriptional profiles and provide a noise-robust approach to the complete deconvolution problem. Furthermore, our analysis reveals systematic bias of all deconvolution techniques due to differences in cell size or RNA-content, and we demonstrate how to address this bias at the experimental design level.
Collapse
|
43
|
Hao Y, Yan M, Heath BR, Lei YL, Xie Y. Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares. PLoS Comput Biol 2019; 15:e1006976. [PMID: 31059559 PMCID: PMC6522071 DOI: 10.1371/journal.pcbi.1006976] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 05/16/2019] [Accepted: 03/25/2019] [Indexed: 02/08/2023] Open
Abstract
Gene-expression deconvolution is used to quantify different types of cells in a mixed population. It provides a highly promising solution to rapidly characterize the tumor-infiltrating immune landscape and identify cold cancers. However, a major challenge is that gene-expression data are frequently contaminated by many outliers that decrease the estimation accuracy. Thus, it is imperative to develop a robust deconvolution method that automatically decontaminates data by reliably detecting and removing outliers. We developed a new machine learning tool, Fast And Robust DEconvolution of Expression Profiles (FARDEEP), to enumerate immune cell subsets from whole tumor tissue samples. To reduce noise in the tumor gene expression datasets, FARDEEP utilizes an adaptive least trimmed square to automatically detect and remove outliers before estimating the cell compositions. We show that FARDEEP is less susceptible to outliers and returns a better estimation of coefficients than the existing methods with both numerical simulations and real datasets. FARDEEP provides an estimate related to the absolute quantity of each immune cell subset in addition to relative percentages. Hence, FARDEEP represents a novel robust algorithm to complement the existing toolkit for the characterization of tissue-infiltrating immune cell landscape. The source code for FARDEEP is implemented in R and available for download at https://github.com/YuningHao/FARDEEP.git. Rapidly emerging evidence suggests that the tumor immune microenvironment not only predisposes cancer patients to diverse treatment outcomes but also represents a promising source of biomarkers for better patient stratification. Different from the immunohistochemistry-based scoring practice, which focuses on a few selected marker proteins, immune deconvolution pipelines inform a previously untapped method to comprehensively reveal the tumor-infiltrating immune landscape. Recognizing the numerous strengths of existing immune deconvolution algorithms, here we show data outliers, which are inevitable in whole tissue sequencing data sets, substantially skew estimation results. Moreover, an estimate related to the absolute amount of each immune subset offers valuable insight into the nature of the host response in addition to percentage information alone. Thus, we engineered a new immune deconvolution pipeline, coined as Fast and Robust Deconvolution of Expression Profiles (FARDEEP), to automatically detect and remove outliers prior feeding data into the deconvolution algorithm and to provide estimates related to the absolute quantity of each immune subset. Utilizing both synthetic and real data sets, we found that FARDEEP returns superior coefficients and offers a robust tool to reveal the immune landscape of human cancers.
Collapse
Affiliation(s)
- Yuning Hao
- Department of Statistics and Probability, Michigan State University, East Lansing, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States of America
| | - Ming Yan
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States of America
- Department of Mathematics, Michigan State University, East Lansing, United States of America
| | - Blake R. Heath
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry Ann Arbor, United States of America
| | - Yu L. Lei
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry Ann Arbor, United States of America
- University of Michigan Rogel Cancer Center, Ann Arbor, United States of America
- * E-mail: (YLL); (YX)
| | - Yuying Xie
- Department of Statistics and Probability, Michigan State University, East Lansing, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States of America
- * E-mail: (YLL); (YX)
| |
Collapse
|
44
|
Huang S, Sheng X, Susztak K. The kidney transcriptome, from single cells to whole organs and back. Curr Opin Nephrol Hypertens 2019; 28:219-226. [PMID: 30844884 PMCID: PMC6761926 DOI: 10.1097/mnh.0000000000000495] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
PURPOSE OF REVIEW Transcriptome analysis of human kidney samples provides an integrated output of genetic, physiological, or environmental inputs. This review summarizes recent findings including gene expression and genetic variation integration, bulk and single cell gene expression analysis, and describes how such studies have improved our understanding of kidney disease development. RECENT FINDINGS Bulk or whole tissue analysis of patient kidney samples identified a large number of genes, whose levels correlate with kidney function and/or structural damage. These genes were enriched for metabolic and immune functions. Using expression quantitative trait analysis, genetic variations-driven gene expression can be identified. Recent developments in single cell sequencing defined cell-type-specific gene expression changes and highlighted specific cell types for disease development. SUMMARY Recent advancement in whole tissue transcriptomics, specifically incorporating genotype information and single cell data have been powerful to identify kidney disease-associated genes, pathways, and cell types.
Collapse
Affiliation(s)
- Shizheng Huang
- Department of Medicine, Renal Electrolyte and Hypertension Division, University of Pennsylvania, Philadelphia, PA, USA
| | - Xin Sheng
- Department of Medicine, Renal Electrolyte and Hypertension Division, University of Pennsylvania, Philadelphia, PA, USA
| | - Katalin Susztak
- Department of Medicine, Renal Electrolyte and Hypertension Division, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
45
|
Seneviratne AK, Xu M, Henao JJA, Fajardo VA, Hao Z, Voisin V, Xu GW, Hurren R, Kim S, MacLean N, Wang X, Gronda M, Jeyaraju D, Jitkova Y, Ketela T, Mullokandov M, Sharon D, Thomas G, Chouinard-Watkins R, Hawley JR, Schafer C, Yau HL, Khuchua Z, Aman A, Al-Awar R, Gross A, Claypool SM, Bazinet RP, Lupien M, Chan S, De Carvalho DD, Minden MD, Bader GD, Stark KD, LeBlanc P, Schimmer AD. The Mitochondrial Transacylase, Tafazzin, Regulates for AML Stemness by Modulating Intracellular Levels of Phospholipids. Cell Stem Cell 2019; 24:621-636.e16. [PMID: 30930145 DOI: 10.1016/j.stem.2019.02.020] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 12/19/2018] [Accepted: 02/27/2019] [Indexed: 12/17/2022]
Abstract
Tafazzin (TAZ) is a mitochondrial transacylase that remodels the mitochondrial cardiolipin into its mature form. Through a CRISPR screen, we identified TAZ as necessary for the growth and viability of acute myeloid leukemia (AML) cells. Genetic inhibition of TAZ reduced stemness and increased differentiation of AML cells both in vitro and in vivo. In contrast, knockdown of TAZ did not impair normal hematopoiesis under basal conditions. Mechanistically, inhibition of TAZ decreased levels of cardiolipin but also altered global levels of intracellular phospholipids, including phosphatidylserine, which controlled AML stemness and differentiation by modulating toll-like receptor (TLR) signaling.
Collapse
Affiliation(s)
- Ayesh K Seneviratne
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Institute of Medical Sciences, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Mingjing Xu
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Juan J Aristizabal Henao
- Laboratory of Nutritional Lipidomics, Department of Kinesiology, University of Waterloo, Waterloo, ON, Canada
| | - Val A Fajardo
- Department of Health Sciences, Faculty of Applied Health Sciences, Brock University, St. Catharines, ON, Canada
| | - Zhenyue Hao
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Veronique Voisin
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - G Wei Xu
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Rose Hurren
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - S Kim
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Neil MacLean
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Xiaoming Wang
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Marcela Gronda
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Danny Jeyaraju
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Yulia Jitkova
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Troy Ketela
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | | | - David Sharon
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Geethu Thomas
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | | | - James R Hawley
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Caitlin Schafer
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Helen Loo Yau
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Zaza Khuchua
- Department of Biochemistry, Sechenov Medical University, Moscow, Russian Federation; Institute of Medical Research Ilia State University, Tbilisi, Georgia
| | - Ahmed Aman
- Drug Discovery Program, Ontario Institute for Cancer Research, Toronto, ON, Canada; Department of Pharmacology and Toxicology, University of Toronto, ON, Canada
| | - Rima Al-Awar
- Drug Discovery Program, Ontario Institute for Cancer Research, Toronto, ON, Canada; Department of Pharmacology and Toxicology, University of Toronto, ON, Canada
| | - Atan Gross
- Department of Biological Regulation, Weizmann Institute, Rehovot, Israel
| | - Steven M Claypool
- Department of Physiology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Richard P Bazinet
- Department of Nutritional Sciences, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Mathieu Lupien
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Steven Chan
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Daniel D De Carvalho
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Ken D Stark
- Laboratory of Nutritional Lipidomics, Department of Kinesiology, University of Waterloo, Waterloo, ON, Canada
| | - Paul LeBlanc
- Department of Health Sciences, Faculty of Applied Health Sciences, Brock University, St. Catharines, ON, Canada
| | - Aaron D Schimmer
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Institute of Medical Sciences, Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
46
|
Klopfenstein Q, Truntzer C, Vincent J, Ghiringhelli F. Cell lines and immune classification of glioblastoma define patient's prognosis. Br J Cancer 2019; 120:806-814. [PMID: 30899088 PMCID: PMC6474266 DOI: 10.1038/s41416-019-0404-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 01/11/2019] [Accepted: 01/28/2019] [Indexed: 12/26/2022] Open
Abstract
Background Prognostic markers for glioblastoma are lacking. Both intrinsic tumour characteristics and microenvironment could influence cancer prognostic. The aim of our study was to generate a pure glioblastoma cell lines and immune classification in order to decipher the respective role of glioblastoma cell and microenvironment on prognosis. Methods We worked on two large cohorts of patients suffering from glioblastoma (TCGA, n = 481 and Rembrandt, n = 180) for which clinical data, transcriptomic profiles and outcome were recorded. Transcriptomic profiles of 129 pure glioblastoma cell lines were clustered to generate a glioblastoma cell lines classification. Presence of subtypes of glioblastoma cell lines and immune cells was determined using deconvolution. Results Glioblastoma cell lines classification defined three new molecular groups called oncogenic, metabolic and neuronal communication enriched. Neuronal communication-enriched tumours were associated with poor prognosis in both cohorts. Immune cell infiltrate was more frequent in mesenchymal classical classification subgroup and metabolic-enriched tumours. A combination of age, glioblastoma cell lines classification and immune classification could be used to determine patient’s outcome in both cohorts. Conclusions Our study shows that glioblastoma-bearing patients can be classified based on their age, glioblastoma cell lines classification and immune classification. The combination of these information improves the capacity to address prognosis.
Collapse
Affiliation(s)
- Quentin Klopfenstein
- Research Platform in Biological Oncology, Dijon, France.,GIMI Genetic and Immunology Medical Institute, Dijon, France
| | - Caroline Truntzer
- Research Platform in Biological Oncology, Dijon, France.,GIMI Genetic and Immunology Medical Institute, Dijon, France
| | - Julie Vincent
- Department of Medical Oncology, Centre GF Leclerc, Dijon, France
| | - Francois Ghiringhelli
- Research Platform in Biological Oncology, Dijon, France. .,GIMI Genetic and Immunology Medical Institute, Dijon, France. .,Department of Medical Oncology, Centre GF Leclerc, Dijon, France. .,INSERM, UMR1231, Dijon, France.
| |
Collapse
|
47
|
Lau D, Bobe AM, Khan AA. RNA Sequencing of the Tumor Microenvironment in Precision Cancer Immunotherapy. Trends Cancer 2019; 5:149-156. [PMID: 30898262 DOI: 10.1016/j.trecan.2019.02.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 02/11/2019] [Accepted: 02/12/2019] [Indexed: 01/01/2023]
Abstract
RNA sequencing (RNA-seq) provides an efficient high-throughput technique to robustly characterize the tumor immune microenvironment (TME). The increasing use of RNA-seq in clinical and basic science settings provides a powerful opportunity to access novel therapeutic biomarkers in the TME. Advanced computational methods are making it possible to resolve the composition of the tumor immune infiltrate, infer the immunological phenotypes of those cells, and assess the immune receptor repertoire in RNA-seq data. These immunological characterizations have increasingly important implications for guiding immunotherapy use. Here, we highlight recent studies that demonstrate the potential utility of RNA-seq in clinical settings, review key computational methods used for characterizing the TME for precision cancer immunotherapy, and discuss important considerations in data interpretation and current technological limitations.
Collapse
Affiliation(s)
| | | | - Aly A Khan
- Tempus Labs, Chicago, IL 60654, USA; Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
48
|
Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases. Nat Commun 2018; 9:4735. [PMID: 30413720 PMCID: PMC6226523 DOI: 10.1038/s41467-018-07242-6] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 10/19/2018] [Indexed: 02/07/2023] Open
Abstract
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.
Collapse
|
49
|
Hunt GJ, Freytag S, Bahlo M, Gagnon-Bartsch JA. dtangle: accurate and robust cell type deconvolution. Bioinformatics 2018; 35:2093-2099. [DOI: 10.1093/bioinformatics/bty926] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 10/20/2018] [Accepted: 11/06/2018] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Cell type composition of tissues is important in many biological processes. To help understand cell type composition using gene expression data, methods of estimating (deconvolving) cell type proportions have been developed. Such estimates are often used to adjust for confounding effects of cell type in differential expression analysis (DEA).
Results
We propose dtangle, a new cell type deconvolution method. dtangle works on a range of DNA microarray and bulk RNA-seq platforms. It estimates cell type proportions using publicly available, often cross-platform, reference data. We evaluate dtangle on 11 benchmark datasets showing that dtangle is competitive with published deconvolution methods, is robust to outliers and selection of tuning parameters, and is fast. As a case study, we investigate the human immune response to Lyme disease. dtangle’s estimates reveal a temporal trend consistent with previous findings and are important covariates for DEA across disease status.
Availability and implementation
dtangle is on CRAN (cran.r-project.org/package=dtangle) or github (dtangle.github.io).
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gregory J Hunt
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Saskia Freytag
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | | |
Collapse
|
50
|
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018; 19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes. RESULTS We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA). CONCLUSION Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).
Collapse
Affiliation(s)
- Konstantina Dimitrakopoulou
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Lars A Akslen
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway. .,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|