101
|
Transcriptomic Changes of Murine Visceral Fat Exposed to Intermittent Hypoxia at Single Cell Resolution. Int J Mol Sci 2020; 22:ijms22010261. [PMID: 33383883 PMCID: PMC7795619 DOI: 10.3390/ijms22010261] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 11/22/2020] [Accepted: 12/24/2020] [Indexed: 12/12/2022] Open
Abstract
Intermittent hypoxia (IH) is a hallmark of obstructive sleep apnea (OSA) and induces metabolic dysfunction manifesting as inflammation, increased lipolysis and insulin resistance in visceral white adipose tissues (vWAT). However, the cell types and their corresponding transcriptional pathways underlying these functional perturbations are unknown. Here, we applied single nucleus RNA sequencing (snRNA-seq) coupled with aggregate RNA-seq methods to evaluate the cellular heterogeneity in vWAT following IH exposures mimicking OSA. C57BL/6 male mice were exposed to IH and room air (RA) for 6 weeks, and nuclei from vWAT were isolated and processed for snRNA-seq followed by differential expressed gene (DEGs) analyses by cell type, along with gene ontology and canonical pathways enrichment tests of significance. IH induced significant transcriptional changes compared to RA across 14 different cell types identified in vWAT. We identified cell-specific signature markers, transcriptional networks, metabolic signaling pathways, and cellular subpopulation enrichment in vWAT. Globally, we also identify 298 common regulated genes across multiple cellular types that are associated with metabolic pathways. Deconvolution of cell types in vWAT using global RNA-seq revealed that distinct adipocytes appear to be differentially implicated in key aspects of metabolic dysfunction. Thus, the heterogeneity of vWAT and its response to IH at the cellular level provides important insights into the metabolic morbidity of OSA and may possibly translate into therapeutic targets.
Collapse
|
102
|
Chen L, Wu CT, Wang N, Herrington DM, Clarke R, Wang Y. debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues. Bioinformatics 2020; 36:3927-3929. [PMID: 32219387 DOI: 10.1093/bioinformatics/btaa205] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 03/05/2020] [Accepted: 03/23/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We develop a fully unsupervised deconvolution method to dissect complex tissues into molecularly distinctive tissue or cell subtypes based on bulk expression profiles. We implement an R package, deconvolution by Convex Analysis of Mixtures (debCAM) that can automatically detect tissue/cell-specific markers, determine the number of constituent subtypes, calculate subtype proportions in individual samples and estimate tissue/cell-specific expression profiles. We demonstrate the performance and biomedical utility of debCAM on gene expression, methylation, proteomics and imaging data. With enhanced data preprocessing and prior knowledge incorporation, debCAM software tool will allow biologists to perform a more comprehensive and unbiased characterization of tissue remodeling in many biomedical contexts. AVAILABILITY AND IMPLEMENTATION http://bioconductor.org/packages/debCAM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Niya Wang
- Search Ranking Unit, Google LLC, Mountain View, CA 94043, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
103
|
Fernández EA, Mahmoud YD, Veigas F, Rocha D, Miranda M, Merlo J, Balzarini M, Lujan HD, Rabinovich GA, Girotti MR. Unveiling the immune infiltrate modulation in cancer and response to immunotherapy by MIXTURE-an enhanced deconvolution method. Brief Bioinform 2020; 22:6035270. [PMID: 33320931 DOI: 10.1093/bib/bbaa317] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 10/01/2020] [Accepted: 10/17/2020] [Indexed: 12/14/2022] Open
Abstract
The accurate quantification of tumor-infiltrating immune cells turns crucial to uncover their role in tumor immune escape, to determine patient prognosis and to predict response to immune checkpoint blockade. Current state-of-the-art methods that quantify immune cells from tumor biopsies using gene expression data apply computational deconvolution methods that present multicollinearity and estimation errors resulting in the overestimation or underestimation of the diversity of infiltrating immune cells and their quantity. To overcome such limitations, we developed MIXTURE, a new ν-support vector regression-based noise constrained recursive feature selection algorithm based on validated immune cell molecular signatures. MIXTURE provides increased robustness to cell type identification and proportion estimation, outperforms the current methods, and is available to the wider scientific community. We applied MIXTURE to transcriptomic data from tumor biopsies and found relevant novel associations between the components of the immune infiltrate and molecular subtypes, tumor driver biomarkers, tumor mutational burden, microsatellite instability, intratumor heterogeneity, cytolytic score, programmed cell death ligand 1 expression, patients' survival and response to anti-cytotoxic T-lymphocyte-associated antigen 4 and anti-programmed cell death protein 1 immunotherapy.
Collapse
Affiliation(s)
| | - Yamil D Mahmoud
- Translational Immuno Oncology Lab at the Institute of Biology and Experimental Medicine in Buenos Aires, Argentina
| | - Florencia Veigas
- Translational Immuno Oncology Lab at the Institute of Biology and Experimental Medicine
| | | | | | - Joaquín Merlo
- Translational Immuno Oncology Lab at the Institute of Biology and Experimental Medicine
| | | | - Hugo D Lujan
- Argentinian National Council for Scientific and Technical Research
| | | | - María Romina Girotti
- Translational Immuno Oncology Lab at the Institute of Biology and Experimental Medicine in Buenos Aires, Argentina
| |
Collapse
|
104
|
Charting Extracellular Transcriptomes in The Human Biofluid RNA Atlas. Cell Rep 2020; 33:108552. [PMID: 33378673 DOI: 10.1016/j.celrep.2020.108552] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 10/14/2020] [Accepted: 12/03/2020] [Indexed: 02/06/2023] Open
Abstract
Extracellular RNAs present in biofluids have emerged as potential biomarkers for disease. Where most studies focus on blood-derived fluids, other biofluids may be more informative. We present an atlas of messenger, circular, and small RNA transcriptomes of a comprehensive collection of 20 human biofluids. By means of synthetic spike-in controls, we compare RNA content across biofluids, revealing a 10,000-fold difference in concentration. The circular RNA fraction is increased in most biofluids compared to tissues. Each biofluid transcriptome is enriched for RNA molecules derived from specific tissues and cell types. Our atlas enables an informed selection of the most relevant biofluid to monitor particular diseases. To verify the biomarker potential in these biofluids, four validation cohorts representing a broad spectrum of diseases were profiled, revealing numerous differential RNAs between case and control subjects. Spike-normalized data are publicly available in the R2 web portal for further exploration.
Collapse
|
105
|
Li Y, Xu Q, Wu D, Chen G. Exploring Additional Valuable Information From Single-Cell RNA-Seq Data. Front Cell Dev Biol 2020; 8:593007. [PMID: 33335900 PMCID: PMC7736616 DOI: 10.3389/fcell.2020.593007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 10/26/2020] [Indexed: 12/28/2022] Open
Abstract
Single-cell RNA-seq (scRNA-seq) technologies are broadly applied to dissect the cellular heterogeneity and expression dynamics, providing unprecedented insights into single-cell biology. Most of the scRNA-seq studies mainly focused on the dissection of cell types/states, developmental trajectory, gene regulatory network, and alternative splicing. However, besides these routine analyses, many other valuable scRNA-seq investigations can be conducted. Here, we first review cell-to-cell communication exploration, RNA velocity inference, identification of large-scale copy number variations and single nucleotide changes, and chromatin accessibility prediction based on single-cell transcriptomics data. Next, we discuss the identification of novel genes/transcripts through transcriptome reconstruction approaches, as well as the profiling of long non-coding RNAs and circular RNAs. Additionally, we survey the integration of single-cell and bulk RNA-seq datasets for deconvoluting the cell composition of large-scale bulk samples and linking single-cell signatures to patient outcomes. These additional analyses could largely facilitate corresponding basic science and clinical applications.
Collapse
Affiliation(s)
- Yunjin Li
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Qiyue Xu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Duojiao Wu
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|
106
|
Schmidt M, Maié T, Dahl E, Costa IG, Wagner W. Deconvolution of cellular subsets in human tissue based on targeted DNA methylation analysis at individual CpG sites. BMC Biol 2020; 18:178. [PMID: 33234153 PMCID: PMC7687708 DOI: 10.1186/s12915-020-00910-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/28/2020] [Indexed: 12/12/2022] Open
Abstract
Background The complex composition of different cell types within a tissue can be estimated by deconvolution of bulk gene expression profiles or with various single-cell sequencing approaches. Alternatively, DNA methylation (DNAm) profiles have been used to establish an atlas for multiple human tissues and cell types. DNAm is particularly suitable for deconvolution of cell types because each CG dinucleotide (CpG site) has only two states per DNA strand—methylated or non-methylated—and these epigenetic modifications are very consistent during cellular differentiation. So far, deconvolution of DNAm profiles implies complex signatures of many CpGs that are often measured by genome-wide analysis with Illumina BeadChip microarrays. In this study, we investigated if the characterization of cell types in tissue is also feasible with individual cell type-specific CpG sites, which can be addressed by targeted analysis, such as pyrosequencing. Results We compiled and curated 579 Illumina 450k BeadChip DNAm profiles of 14 different non-malignant human cell types. A training and validation strategy was applied to identify and test for cell type-specific CpGs. We initially focused on estimating the relative amount of fibroblasts using two CpGs that were either hypermethylated or hypomethylated in fibroblasts. The combination of these two DNAm levels into a “FibroScore” correlated with the state of fibrosis and was associated with overall survival in various types of cancer. Furthermore, we identified hypomethylated CpGs for leukocytes, endothelial cells, epithelial cells, hepatocytes, glia, neurons, fibroblasts, and induced pluripotent stem cells. The accuracy of this eight CpG signature was tested in additional BeadChip datasets of defined cell mixtures and the results were comparable to previously published signatures based on several thousand CpGs. Finally, we established and validated pyrosequencing assays for the relevant CpGs that can be utilized for classification and deconvolution of cell types. Conclusion This proof of concept study demonstrates that DNAm analysis at individual CpGs reflects the cellular composition of cellular mixtures and different tissues. Targeted analysis of these genomic regions facilitates robust methods for application in basic research and clinical settings.
Collapse
Affiliation(s)
- Marco Schmidt
- Helmholtz-Institute for Biomedical Engineering, Stem Cell Biology and Cellular Engineering, RWTH Aachen University Medical School, 52074, Aachen, Germany.,Institute for Biomedical Engineering - Cell Biology, University Hospital of RWTH Aachen, 52074, Aachen, Germany
| | - Tiago Maié
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Edgar Dahl
- RWTH centralized Biomaterial Bank (RWTH cBMB), Medical Faculty, RWTH Aachen University, Aachen, Germany
| | - Ivan G Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Wolfgang Wagner
- Helmholtz-Institute for Biomedical Engineering, Stem Cell Biology and Cellular Engineering, RWTH Aachen University Medical School, 52074, Aachen, Germany. .,Institute for Biomedical Engineering - Cell Biology, University Hospital of RWTH Aachen, 52074, Aachen, Germany.
| |
Collapse
|
107
|
Hippen AA, Greene CS. Expanding and Remixing the Metadata Landscape. Trends Cancer 2020; 7:276-278. [PMID: 33229213 PMCID: PMC8324015 DOI: 10.1016/j.trecan.2020.10.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 10/26/2020] [Accepted: 10/27/2020] [Indexed: 12/12/2022]
Abstract
Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated descriptions of samples and their handling. We discuss how machine learning-derived elements complement such descriptions to enhance the research ecosystem around genomic data.
Collapse
Affiliation(s)
- Ariel A Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA.
| |
Collapse
|
108
|
Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 2020; 11:5650. [PMID: 33159064 PMCID: PMC7648640 DOI: 10.1038/s41467-020-19015-1] [Citation(s) in RCA: 243] [Impact Index Per Article: 48.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
Collapse
|
109
|
Bolis M, Vallerga A, Fratelli M. Computational deconvolution of transcriptomic data for the study of tumor-infiltrating immune cells. Int J Biol Markers 2020; 35:20-22. [PMID: 32079462 DOI: 10.1177/1724600820903317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Cancer is a complex disease characterized by a wide array of mutually interacting components constituting the tumor microenvironment (connective tissue, vascular system, immune cells), many of which are targeted therapeutically. In particular, immune checkpoint inhibitors have recently become an established part of the treatment of cancer. Despite great promise, only a portion of the patients display durable response. Current research efforts are concentrated on the determination of tumor-specific biomarkers predictive of response, such as tumor mutational burden, microsatellite instability, and neo-antigen presentation. However, it is clear that several additional characteristics pertaining to the tumor microenvironment play a critical role in the effectiveness of immunotherapy. Here we comment on the computational methods that are used for the analysis of the tumor microenvironment components from transcriptomic data, discuss the critical needs, and foresee potential evolutions in the field.
Collapse
Affiliation(s)
- Marco Bolis
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milano, Italy
| | - Arianna Vallerga
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milano, Italy
| | | |
Collapse
|
110
|
Transcriptomic Analysis of Age-Associated Periventricular Lesions Reveals Dysregulation of the Immune Response. Int J Mol Sci 2020; 21:ijms21217924. [PMID: 33113879 PMCID: PMC7663268 DOI: 10.3390/ijms21217924] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/23/2020] [Accepted: 10/23/2020] [Indexed: 12/22/2022] Open
Abstract
White matter lesions (WML) are a common feature of the ageing brain associated with cognitive impairment. The gene expression profiles of periventricular lesions (PVL, n = 7) and radiologically-normal-appearing (control) periventricular white matter cases (n = 11) obtained from the Cognitive Function and Ageing Study (CFAS) neuropathology cohort were interrogated using microarray analysis and NanoString to identify novel mechanisms potentially underlying their formation. Histological characterisation of control white matter cases identified a subgroup (n = 4) which contained high levels of MHC-II immunoreactive microglia, and were classified as “pre-lesional.” Microarray analysis identified 2256 significantly differentially-expressed genes (p ≤ 0.05, FC ≥ 1.2) in PVL compared to non-lesional control white matter (1378 upregulated and 878 downregulated); 2649 significantly differentially-expressed genes in “pre-lesional” cases compared to PVL (1390 upregulated and 1259 downregulated); and 2398 significantly differentially-expressed genes in “pre-lesional” versus non-lesional control cases (1527 upregulated and 871 downregulated). Whilst histological evaluation of a single marker (MHC-II) implicates immune-activated microglia in lesion pathology, transcriptomic analysis indicates significant downregulation of a number of activated microglial markers and suggests established PVL are part of a continuous spectrum of white matter injury. The gene expression profile of “pre-lesional” periventricular white matter suggests upregulation of several signalling pathways may be a neuroprotective response to prevent the pathogenesis of PVL.
Collapse
|
111
|
Fair BJ, Blake LE, Sarkar A, Pavlovic BJ, Cuevas C, Gilad Y. Gene expression variability in human and chimpanzee populations share common determinants. eLife 2020; 9:59929. [PMID: 33084571 PMCID: PMC7644215 DOI: 10.7554/elife.59929] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/20/2020] [Indexed: 12/20/2022] Open
Abstract
Inter-individual variation in gene expression has been shown to be heritable and is often associated with differences in disease susceptibility between individuals. Many studies focused on mapping associations between genetic and gene regulatory variation, yet much less attention has been paid to the evolutionary processes that shape the observed differences in gene regulation between individuals in humans or any other primate. To begin addressing this gap, we performed a comparative analysis of gene expression variability and expression quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. We found that expression variability in both species is often determined by non-genetic sources, such as cell-type heterogeneity. However, we also provide evidence that inter-individual variation in gene regulation can be genetically controlled, and that the degree of such variability is generally conserved in humans and chimpanzees. In particular, we found a significant overlap of orthologous genes associated with eQTLs in both species. We conclude that gene expression variability in humans and chimpanzees often evolves under similar evolutionary pressures.
Collapse
Affiliation(s)
| | - Lauren E Blake
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Bryan J Pavlovic
- Department of Neurology, University of California, San Francisco (UCSF), San Francisco, United States
| | - Claudia Cuevas
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Yoav Gilad
- Department of Medicine, University of Chicago, Chicago, United States.,Department of Human Genetics, University of Chicago, Chicago, United States
| |
Collapse
|
112
|
Groth EE, Weber M, Bahmer T, Pedersen F, Kirsten A, Börnigen D, Rabe KF, Watz H, Ammerpohl O, Goldmann T. Exploration of the sputum methylome and omics deconvolution by quadratic programming in molecular profiling of asthma and COPD: the road to sputum omics 2.0. Respir Res 2020; 21:274. [PMID: 33076907 PMCID: PMC7574293 DOI: 10.1186/s12931-020-01544-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 10/11/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND To date, most studies involving high-throughput analyses of sputum in asthma and COPD have focused on identifying transcriptomic signatures of disease. No whole-genome methylation analysis of sputum cells has been performed yet. In this context, the highly variable cellular composition of sputum has potential to confound the molecular analyses. METHODS Whole-genome transcription (Agilent Human 4 × 44 k array) and methylation (Illumina 450 k BeadChip) analyses were performed on sputum samples of 9 asthmatics, 10 healthy and 10 COPD subjects. RNA integrity was checked by capillary electrophoresis and used to correct in silico for bias conferred by RNA degradation during biobank sample storage. Estimates of cell type-specific molecular profiles were derived via regression by quadratic programming based on sputum differential cell counts. All analyses were conducted using the open-source R/Bioconductor software framework. RESULTS A linear regression step was found to perform well in removing RNA degradation-related bias among the main principal components of the gene expression data, increasing the number of genes detectable as differentially expressed in asthma and COPD sputa (compared to controls). We observed a strong influence of the cellular composition on the results of mixed-cell sputum analyses. Exemplarily, upregulated genes derived from mixed-cell data in asthma were dominated by genes predominantly expressed in eosinophils after deconvolution. The deconvolution, however, allowed to perform differential expression and methylation analyses on the level of individual cell types and, though we only analyzed a limited number of biological replicates, was found to provide good estimates compared to previously published data about gene expression in lung eosinophils in asthma. Analysis of the sputum methylome indicated presence of differential methylation in genomic regions of interest, e.g. mapping to a number of human leukocyte antigen (HLA) genes related to both major histocompatibility complex (MHC) class I and II molecules in asthma and COPD macrophages. Furthermore, we found the SMAD3 (SMAD family member 3) gene, among others, to lie within differentially methylated regions which has been previously reported in the context of asthma. CONCLUSIONS In this methodology-oriented study, we show that methylation profiling can be easily integrated into sputum analysis workflows and exhibits a strong potential to contribute to the profiling and understanding of pulmonary inflammation. Wherever RNA degradation is of concern, in silico correction can be effective in improving both sensitivity and specificity of downstream analyses. We suggest that deconvolution methods should be integrated in sputum omics analysis workflows whenever possible in order to facilitate the unbiased discovery and interpretation of molecular patterns of inflammation.
Collapse
Affiliation(s)
- Espen E Groth
- LungenClinic Grosshansdorf, Großhansdorf, Germany. .,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany. .,Department of Internal Medicine I, Pneumology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany. .,Department of Oncology, Hematology and BMT with Section Pneumology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Melanie Weber
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | - Thomas Bahmer
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Department of Internal Medicine I, Pneumology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Frauke Pedersen
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Anne Kirsten
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Daniela Börnigen
- Bioinformatics Core Unit, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Klaus F Rabe
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany
| | - Henrik Watz
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Ole Ammerpohl
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Institute of Human Genetics, University Medical Center Ulm, Ulm, Germany
| | - Torsten Goldmann
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Research Center Borstel, Pathology, Borstel, Germany
| |
Collapse
|
113
|
Li Y, He X, Li Q, Lai H, Zhang H, Hu Z, Li Y, Huang S. EV-origin: Enumerating the tissue-cellular origin of circulating extracellular vesicles using exLR profile. Comput Struct Biotechnol J 2020; 18:2851-2859. [PMID: 33133426 PMCID: PMC7588739 DOI: 10.1016/j.csbj.2020.10.002] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/29/2020] [Accepted: 10/02/2020] [Indexed: 02/07/2023] Open
Abstract
Extracellular vesicles (EVs) are complex ecosystems that can be derived from all body cells and circulated in the body fluids. Characterizing the tissue-cellular source contributing to circulating EVs provides biological information about the cell or tissue of origin and their functional states. However, the relative proportion of tissue-cellular origin of circulating EVs in body fluid has not been thoroughly characterized. Here, we developed an approach for digital EVs quantification, called EV-origin, that enables enumerating of EVs tissue-cellular source contribution from plasma extracellular vesicles long RNA sequencing profiles. EV-origin was constructed by the input matrix of gene expression signatures and robust deconvolution algorithm, collectively used to separate the relative proportions of each tissue or cell type of interest. EV-origin respectively predicted the relative enrichment of seven types of hemopoietic cells and sixteen solid tissue subsets from exLR-seq profile. Using the EV-origin approach, we depicted an integrated landscape of the traceability system of plasma EVs for healthy individuals. We also compared the heterogenous tissue-cellular source components from plasma EVs samples with diverse disease status. Notably, the aberrant liver fraction could reflect the development and progression of hepatic disease. The liver fraction could also serve as a diagnostic indicator and effectively separate HCC patients from normal individuals. The EV-origin provides an approach to decipher the complex heterogeneity of tissue-cellular origin in circulating EVs. Our approach could inform the development of exLR-based applications for liquid biopsy.
Collapse
Affiliation(s)
- Yuchen Li
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xigan He
- Department of Hepatic Surgery, Fudan University Shanghai Cancer Center, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Qin Li
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Hongyan Lai
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Hena Zhang
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Zhixiang Hu
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Yan Li
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Shenglin Huang
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| |
Collapse
|
114
|
Abstract
PURPOSE OF REVIEW The goal of this review is to summarize the state of big data analyses in the study of heart failure (HF). We discuss the use of big data in the HF space, focusing on "omics" and clinical data. We address some limitations of this data, as well as their future potential. RECENT FINDINGS Omics are providing insight into plasmal and myocardial molecular profiles in HF patients. The introduction of single cell and spatial technologies is a major advance that will reshape our understanding of cell heterogeneity and function as well as tissue architecture. Clinical data analysis focuses on HF phenotyping and prognostic modeling. Big data approaches are increasingly common in HF research. The use of methods designed for big data, such as machine learning, may help elucidate the biology underlying HF. However, important challenges remain in the translation of this knowledge into improvements in clinical care.
Collapse
Affiliation(s)
- Jan D Lanzer
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Internal Medicine II, Heidelberg University Hospital, Heidelberg, Germany
| | - Florian Leuschner
- Department of Cardiology, Medical University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Rafael Kramann
- Department of Nephrology and Clinical Immunology, RWTH Aachen University, Aachen, Germany
- Department of Internal Medicine, Nephrology and Transplantation, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Rebecca T Levinson
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany
- Internal Medicine II, Heidelberg University Hospital, Heidelberg, Germany
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany.
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
115
|
Radiogenomic signatures reveal multiscale intratumour heterogeneity associated with biological functions and survival in breast cancer. Nat Commun 2020; 11:4861. [PMID: 32978398 PMCID: PMC7519071 DOI: 10.1038/s41467-020-18703-2] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/08/2020] [Indexed: 12/24/2022] Open
Abstract
Advanced tumours are often heterogeneous, consisting of subclones with various genetic alterations and functional roles. The precise molecular features that characterize the contributions of multiscale intratumour heterogeneity to malignant progression, metastasis, and poor survival are largely unknown. Here, we address these challenges in breast cancer by defining the landscape of heterogeneous tumour subclones and their biological functions using radiogenomic signatures. Molecular heterogeneity is identified by a fully unsupervised deconvolution of gene expression data. Relative prevalence of two subclones associated with cell cycle and primary immunodeficiency pathways identifies patients with significantly different survival outcomes. Radiogenomic signatures of imaging scale heterogeneity are extracted and used to classify patients into groups with distinct subclone compositions. Prognostic value is confirmed by survival analysis accounting for clinical variables. These findings provide insight into how a radiogenomic analysis can identify the biological activities of specific subclones that predict prognosis in a noninvasive and clinically relevant manner. Tumours are made up of heterogeneous subclones. Here, the authors show using breast cancer imaging and gene expression datasets that these subclones can be inferred by the deconvolution of gene expression data, mapped to MRI derived radiogenomic signatures and used to estimate prognosis.
Collapse
|
116
|
Li H, Sharma A, Ming W, Sun X, Liu H. A deconvolution method and its application in analyzing the cellular fractions in acute myeloid leukemia samples. BMC Genomics 2020; 21:652. [PMID: 32967610 PMCID: PMC7510109 DOI: 10.1186/s12864-020-06888-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 07/07/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The identification of cell type-specific genes (markers) is an essential step for the deconvolution of the cellular fractions, primarily, from the gene expression data of a bulk sample. However, the genes with significant changes identified by pair-wise comparisons cannot indeed represent the specificity of gene expression across multiple conditions. In addition, the knowledge about the identification of gene expression markers across multiple conditions is still paucity. RESULTS Herein, we developed a hybrid tool, LinDeconSeq, which consists of 1) identifying marker genes using specificity scoring and mutual linearity strategies across any number of cell types, and 2) predicting cellular fractions of bulk samples using weighted robust linear regression with the marker genes identified in the first stage. On multiple publicly available datasets, the marker genes identified by LinDeconSeq demonstrated better accuracy and reproducibility compared to MGFM and RNentropy. Among deconvolution methods, LinDeconSeq showed low average deviations (≤0.0958) and high average Pearson correlations (≥0.8792) between the predicted and actual fractions on the benchmark datasets. Importantly, the cellular fractions predicted by LinDeconSeq appear to be relevant in the diagnosis of acute myeloid leukemia (AML). The distinct cellular fractions in granulocyte-monocyte progenitor (GMP), lymphoid-primed multipotent progenitor (LMPP) and monocytes (MONO) were found to be closely associated with AML compared to the healthy samples. Moreover, the heterogeneity of cellular fractions in AML patients divided these patients into two subgroups, differing in both prognosis and mutation patterns. GMP fraction was the most pronounced between these two subgroups, particularly, in SubgroupA, which was strongly associated with the better AML prognosis and the younger population. Totally, the identification of marker genes by LinDeconSeq represents the improved feature for deconvolution. The data processing strategy with regard to the cellular fractions used in this study also showed potential for the diagnosis and prognosis of diseases. CONCLUSIONS Taken together, we developed a freely-available and open-source tool LinDeconSeq ( https://github.com/lihuamei/LinDeconSeq ), which includes marker identification and deconvolution procedures. LinDeconSeq is comparable to other current methods in terms of accuracy when applied to benchmark datasets and has broad application in clinical outcome and disease-specific molecular mechanisms.
Collapse
Affiliation(s)
- Huamei Li
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Amit Sharma
- Department of Ophthalmology, University Hospital Bonn, 53127, Bonn, Germany
| | - Wenglong Ming
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China.
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China.
| |
Collapse
|
117
|
Chen Z, Ji C, Shen Q, Liu W, Qin FXF, Wu A. Tissue-specific deconvolution of immune cell composition by integrating bulk and single-cell transcriptomes. Bioinformatics 2020; 36:819-827. [PMID: 31504185 DOI: 10.1093/bioinformatics/btz672] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 08/13/2019] [Accepted: 08/22/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Many methods have been developed to estimate immune cell composition from tissue transcriptomes. One common characteristic of these methods is that they are trained using a set of general immune cell transcriptomes that ignores tissue specificities. However, as immune cells are localized in different tissues, they may have distinct expression profiles. Hence, calculations that use general signature matrices may hinder the deconvolution accuracy. RESULTS This study used single cell RNA-sequencing (scRNA-Seq) data from different mouse tissues instead of general signature expression values to generate tissue-specific signature gene matrices that are used as the input of the deconvolution model. First, the transcriptome of immune cells in each tissue was extracted from scRNA-Seq data and used to construct the entire expression matrix of tissue immune cells. Then, after comparing different gene selection strategies, the expressions of 162 seq-ImmuCC derived signature genes in tissue immune cell scRNA-Seq data were regarded as the tissue specific signature matrices. Finally, a modest improvement in performance was observed in multiple tissues that refer to a traditional general signature matrix in the deconvolution model. With the fast accumulation of scRNA-Seq data, the introduction of these data into an estimation of immune cell compositions for different tissues will open a new window for avoiding tissue bias for immune cell expression. AVAILABILITY AND IMPLEMENTATION The signature matrices were available at https://github.com/wuaipinglab/ImmuCC/tree/master/tissue_immucc/SignatureMatrix). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyi Chen
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.,Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Chengyang Ji
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.,Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Qin Shen
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.,Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Wei Liu
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.,Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - F Xiao-Feng Qin
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.,Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Aiping Wu
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.,Suzhou Institute of Systems Medicine, Suzhou 215123, China
| |
Collapse
|
118
|
Kim-Hellmuth S, Aguet F, Oliva M, Muñoz-Aguirre M, Kasela S, Wucher V, Castel SE, Hamel AR, Viñuela A, Roberts AL, Mangul S, Wen X, Wang G, Barbeira AN, Garrido-Martín D, Nadel BB, Zou Y, Bonazzola R, Quan J, Brown A, Martinez-Perez A, Soria JM, Getz G, Dermitzakis ET, Small KS, Stephens M, Xi HS, Im HK, Guigó R, Segrè AV, Stranger BE, Ardlie KG, Lappalainen T. Cell type-specific genetic regulation of gene expression across human tissues. Science 2020; 369:eaaz8528. [PMID: 32913075 PMCID: PMC8051643 DOI: 10.1126/science.aaz8528] [Citation(s) in RCA: 201] [Impact Index Per Article: 40.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 07/31/2020] [Indexed: 12/15/2022]
Abstract
The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type-interaction QTLs for seven cell types and show that cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue.
Collapse
Affiliation(s)
- Sarah Kim-Hellmuth
- Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany.
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - François Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Meritxell Oliva
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Manuel Muñoz-Aguirre
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain
- Department of Statistics and Operations Research, Universitat Politècnica de Catalunya (UPC), Barcelona, Catalonia, Spain
| | - Silva Kasela
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Valentin Wucher
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain
| | - Stephane E Castel
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Andrew R Hamel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ocular Genomics Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| | - Ana Viñuela
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Amy L Roberts
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Serghei Mangul
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, Los Angeles, CA, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Gao Wang
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain
| | - Brian B Nadel
- Department of Molecular, Cellular, and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, IL, USA
| | - Rodrigo Bonazzola
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jie Quan
- Inflammation & Immunology, Pfizer, Cambridge, MA, USA
| | - Andrew Brown
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Population Health and Genomics, University of Dundee, Dundee, Scotland, UK
| | - Angel Martinez-Perez
- Unit of Genomic of Complex Diseases, Institut d'Investigació Biomèdica Sant Pau (IIB-Sant Pau), Barcelona, Spain
| | - José Manuel Soria
- Unit of Genomic of Complex Diseases, Institut d'Investigació Biomèdica Sant Pau (IIB-Sant Pau), Barcelona, Spain
| | - Gad Getz
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Kerrin S Small
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Hualin S Xi
- Foundational Neuroscience Center, AbbVie, Cambridge, MA, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Ayellet V Segrè
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ocular Genomics Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| | - Barbara E Stranger
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
- Center for Genetic Medicine, Department of Pharmacology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | | | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA
| |
Collapse
|
119
|
Clarke R, Kraikivski P, Jones BC, Sevigny CM, Sengupta S, Wang Y. A systems biology approach to discovering pathway signaling dysregulation in metastasis. Cancer Metastasis Rev 2020; 39:903-918. [PMID: 32776157 PMCID: PMC7487029 DOI: 10.1007/s10555-020-09921-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 07/13/2020] [Indexed: 02/07/2023]
Abstract
Total metastatic burden is the primary cause of death for many cancer patients. While the process of metastasis has been studied widely, much remains to be understood. Moreover, few agents have been developed that specifically target the major steps of the metastatic cascade. Many individual genes and pathways have been implicated in metastasis but a holistic view of how these interact and cooperate to regulate and execute the process remains somewhat rudimentary. It is unclear whether all of the signaling features that regulate and execute metastasis are yet fully understood. Novel features of a complex system such as metastasis can often be discovered by taking a systems-based approach. We introduce the concepts of systems modeling and define some of the central challenges facing the application of a multidisciplinary systems-based approach to understanding metastasis and finding actionable targets therein. These challenges include appreciating the unique properties of the high-dimensional omics data often used for modeling, limitations in knowledge of the system (metastasis), tumor heterogeneity and sampling bias, and some of the issues key to understanding critical features of molecular signaling in the context of metastasis. We also provide a brief introduction to integrative modeling that focuses on both the nodes and edges of molecular signaling networks. Finally, we offer some observations on future directions as they relate to developing a systems-based model of the metastatic cascade.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA.
- Hormel Institute and Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Austin, MN, 55912, USA.
| | - Pavel Kraikivski
- Academy of Integrated Science, Division of Systems Biology, Virginia Polytechnic and State University, Blacksburg, VA, 24061, USA
| | - Brandon C Jones
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Catherine M Sevigny
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Surojeet Sengupta
- Department of Oncology, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington, DC, 20057, USA
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| |
Collapse
|
120
|
Fan J, Slowikowski K, Zhang F. Single-cell transcriptomics in cancer: computational challenges and opportunities. Exp Mol Med 2020; 52:1452-1465. [PMID: 32929226 PMCID: PMC8080633 DOI: 10.1038/s12276-020-0422-0] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 02/26/2020] [Accepted: 03/10/2020] [Indexed: 02/07/2023] Open
Abstract
Intratumor heterogeneity is a common characteristic across diverse cancer types and presents challenges to current standards of treatment. Advancements in high-throughput sequencing and imaging technologies provide opportunities to identify and characterize these aspects of heterogeneity. Notably, transcriptomic profiling at a single-cell resolution enables quantitative measurements of the molecular activity that underlies the phenotypic diversity of cells within a tumor. Such high-dimensional data require computational analysis to extract relevant biological insights about the cell types and states that drive cancer development, pathogenesis, and clinical outcomes. In this review, we highlight emerging themes in the computational analysis of single-cell transcriptomics data and their applications to cancer research. We focus on downstream analytical challenges relevant to cancer research, including how to computationally perform unified analysis across many patients and disease states, distinguish neoplastic from nonneoplastic cells, infer communication with the tumor microenvironment, and delineate tumoral and microenvironmental evolution with trajectory and RNA velocity analysis. We include discussions of challenges and opportunities for future computational methodological advancements necessary to realize the translational potential of single-cell transcriptomic profiling in cancer.
Collapse
Affiliation(s)
- Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA.
| | - Kamil Slowikowski
- Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Charlestown, MA, USA
| | - Fan Zhang
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
121
|
Lee D, Park Y, Kim S. Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches. Brief Bioinform 2020; 22:5896573. [PMID: 34020548 DOI: 10.1093/bib/bbaa188] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/29/2020] [Accepted: 07/21/2020] [Indexed: 12/19/2022] Open
Abstract
The multi-omics molecular characterization of cancer opened a new horizon for our understanding of cancer biology and therapeutic strategies. However, a tumor biopsy comprises diverse types of cells limited not only to cancerous cells but also to tumor microenvironmental cells and adjacent normal cells. This heterogeneity is a major confounding factor that hampers a robust and reproducible bioinformatic analysis for biomarker identification using multi-omics profiles. Besides, the heterogeneity itself has been recognized over the years for its significant prognostic values in some cancer types, thus offering another promising avenue for therapeutic intervention. A number of computational approaches to unravel such heterogeneity from high-throughput molecular profiles of a tumor sample have been proposed, but most of them rely on the data from an individual omics layer. Since the heterogeneity of cells is widely distributed across multi-omics layers, methods based on an individual layer can only partially characterize the heterogeneous admixture of cells. To help facilitate further development of the methodologies that synchronously account for several multi-omics profiles, we wrote a comprehensive review of diverse approaches to characterize tumor heterogeneity based on three different omics layers: genome, epigenome and transcriptome. As a result, this review can be useful for the analysis of multi-omics profiles produced by many large-scale consortia. Contact:sunkim.bioinfo@snu.ac.kr.
Collapse
Affiliation(s)
- Dohoon Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Youngjune Park
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
122
|
Integrating CRISPR Engineering and hiPSC-Derived 2D Disease Modeling Systems. J Neurosci 2020; 40:1176-1185. [PMID: 32024766 DOI: 10.1523/jneurosci.0518-19.2019] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 10/23/2019] [Accepted: 10/23/2019] [Indexed: 12/20/2022] Open
Abstract
Human induced pluripotent stem cells (hiPSCs) have revolutionized research on human diseases, particularly neurodegenerative and psychiatric disorders, making it possible to study mechanisms of disease risk and initiation in otherwise inaccessible patient-specific cells. Today, the integration of CRISPR engineering approaches with hiPSC-based models permits precise isogenic comparisons of human neurons and glia. This review is intended as a guideline for neuroscientists and clinicians interested in translating their research to hiPSC-based studies. It offers state-of-the-art approaches to tackling the challenges that are unique to human in vitro disease models, particularly interdonor and intradonor variability, and limitations in neuronal maturity and circuit complexity. Finally, we provide a detailed overview of the immense possibilities the field has to offer, highlighting efficient neural differentiation and induction strategies for the major brain cell types and providing perspective into integrating CRISPR-based methods into study design. The combination of hiPSC-based disease modeling, CRISPR technology, and high-throughput approaches promises to advance our scientific knowledge and accelerate progress in drug discovery.Dual Perspectives Companion Paper: Studying Human Neurodevelopment and Diseases Using 3D Brain Organoids, by Ai Tian, Julien Muffat, and Yun Li.
Collapse
|
123
|
Montaldo P, Cunnington A, Oliveira V, Swamy R, Bandya P, Pant S, Lally PJ, Ivain P, Mendoza J, Atreja G, Padmesh V, Baburaj M, Sebastian M, Yasashwi I, Kamalarathnam C, Chandramohan R, Mangalabharathi S, Kumaraswami K, Kumar S, Benakappa N, Manerkar S, Mondhkar J, Prakash V, Sajjid M, Seeralar A, Jahan I, Moni SC, Shahidullah M, Sujatha R, Chandrasekaran M, Ramji S, Shankaran S, Kaforou M, Herberg J, Thayyil S. Transcriptomic profile of adverse neurodevelopmental outcomes after neonatal encephalopathy. Sci Rep 2020; 10:13100. [PMID: 32753750 PMCID: PMC7403382 DOI: 10.1038/s41598-020-70131-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 06/16/2020] [Indexed: 12/20/2022] Open
Abstract
A rapid and early diagnostic test to identify the encephalopathic babies at risk of adverse outcome may accelerate the development of neuroprotectants. We examined if a whole blood transcriptomic signature measured soon after birth, predicts adverse neurodevelopmental outcome eighteen months after neonatal encephalopathy. We performed next generation sequencing on whole blood ribonucleic acid obtained within six hours of birth from the first 47 encephalopathic babies recruited to the Hypothermia for Encephalopathy in Low and middle-income countries (HELIX) trial. Two infants with blood culture positive sepsis were excluded, and the data from remaining 45 were analysed. A total of 855 genes were significantly differentially expressed between the good and adverse outcome groups, of which RGS1 and SMC4 were the most significant. Biological pathway analysis adjusted for gender, trial randomisation allocation (cooling therapy versus usual care) and estimated blood leukocyte proportions revealed over-representation of genes from pathways related to melatonin and polo-like kinase in babies with adverse outcome. These preliminary data suggest that transcriptomic profiling may be a promising tool for rapid risk stratification in neonatal encephalopathy. It may provide insights into biological mechanisms and identify novel therapeutic targets for neuroprotection.
Collapse
Affiliation(s)
- Paolo Montaldo
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK. .,Neonatal Unit, Università Degli Studi Della Campania "Luigi Vanvitelli", Naples, Italy.
| | - Aubrey Cunnington
- Paediatric Infectious Diseases, Department of Infectious Diseases, Imperial College London, London, UK
| | - Vania Oliveira
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Ravi Swamy
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Prathik Bandya
- Neonatal Medicine, Indira Gandhi Institute of Child Health, Bangalore, Karnataka, India
| | - Stuti Pant
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Peter J Lally
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Phoebe Ivain
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Josephine Mendoza
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Gaurav Atreja
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Vadakepat Padmesh
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Mythili Baburaj
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Monica Sebastian
- Neonatal Medicine, Institute of Child Health, Madras Medical College, Tamil Nadu, Chennai, India
| | - Indiramma Yasashwi
- Neonatal Medicine, Indira Gandhi Institute of Child Health, Bangalore, Karnataka, India
| | - Chinnathambi Kamalarathnam
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Rema Chandramohan
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Sundaram Mangalabharathi
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Kumutha Kumaraswami
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Shobha Kumar
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Naveen Benakappa
- Neonatal Medicine, Indira Gandhi Institute of Child Health, Bangalore, Karnataka, India
| | | | | | - Vinayagam Prakash
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Mohammed Sajjid
- Neonatal Medicine, Institute of Obstetrics and Gynaecology, Madras Medical College, Chennai, Tamil Nadu, India
| | - Arasar Seeralar
- Neonatal Medicine, Institute of Child Health, Madras Medical College, Tamil Nadu, Chennai, India
| | - Ismat Jahan
- Neonatal Medicine, Bangabandhu Sheikh Mujib Medical University, Dhaka, Bangladesh
| | | | - Mohammod Shahidullah
- Neonatal Medicine, Bangabandhu Sheikh Mujib Medical University, Dhaka, Bangladesh
| | - Radhika Sujatha
- Neonatal Medicine, Government Medical College, Thiruvananthapuram, Kerala, India
| | - Manigandan Chandrasekaran
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| | - Siddarth Ramji
- Neonatal Medicine, Maulana Azad Medical College, New Delhi, Delhi, India
| | - Seetha Shankaran
- Neonatal-Perinatal Medicine, Wayne State University, Detroit, MI, USA
| | - Myrsini Kaforou
- Paediatric Infectious Diseases, Department of Infectious Diseases, Imperial College London, London, UK
| | - Jethro Herberg
- Paediatric Infectious Diseases, Department of Infectious Diseases, Imperial College London, London, UK
| | - Sudhin Thayyil
- Department of Brain Sciences, Centre for Perinatal Neuroscience, Imperial College London, London, UK
| |
Collapse
|
124
|
Palomero L, Galván-Femenía I, de Cid R, Espín R, Barnes DR, Cimba, Blommaert E, Gil-Gil M, Falo C, Stradella A, Ouchi D, Roso-Llorach A, Violan C, Peña-Chilet M, Dopazo J, Extremera AI, García-Valero M, Herranz C, Mateo F, Mereu E, Beesley J, Chenevix-Trench G, Roux C, Mak T, Brunet J, Hakem R, Gorrini C, Antoniou AC, Lázaro C, Pujana MA. Immune Cell Associations with Cancer Risk. iScience 2020; 23:101296. [PMID: 32622267 PMCID: PMC7334419 DOI: 10.1016/j.isci.2020.101296] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 05/23/2020] [Accepted: 06/15/2020] [Indexed: 01/21/2023] Open
Abstract
Proper immune system function hinders cancer development, but little is known about whether genetic variants linked to cancer risk alter immune cells. Here, we report 57 cancer risk loci associated with differences in immune and/or stromal cell contents in the corresponding tissue. Predicted target genes show expression and regulatory associations with immune features. Polygenic risk scores also reveal associations with immune and/or stromal cell contents, and breast cancer scores show consistent results in normal and tumor tissue. SH2B3 links peripheral alterations of several immune cell types to the risk of this malignancy. Pleiotropic SH2B3 variants are associated with breast cancer risk in BRCA1/2 mutation carriers. A retrospective case-cohort study indicates a positive association between blood counts of basophils, leukocytes, and monocytes and age at breast cancer diagnosis. These findings broaden our knowledge of the role of the immune system in cancer and highlight promising prevention strategies for individuals at high risk.
Collapse
Affiliation(s)
- Luis Palomero
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Ivan Galván-Femenía
- GCAT-Genomes for Life, Germans Trias i Pujol Health Sciences Research Institute (IGTP), Program for Predictive and Personalized Medicine of Cancer (IMPPC), Badalona, Catalonia 08916, Spain
| | - Rafael de Cid
- GCAT-Genomes for Life, Germans Trias i Pujol Health Sciences Research Institute (IGTP), Program for Predictive and Personalized Medicine of Cancer (IMPPC), Badalona, Catalonia 08916, Spain
| | - Roderic Espín
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Daniel R Barnes
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Cimba
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
| | - Eline Blommaert
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Miguel Gil-Gil
- Department of Medical Oncology, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Catalina Falo
- Department of Medical Oncology, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Agostina Stradella
- Department of Medical Oncology, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Dan Ouchi
- Jordi Gol University Institute for Research Primary Healthcare (IDIAP Jordi Gol), Barcelona, Catalonia 08007, Spain; Autonomous University of Barcelona, Bellaterra, Catalonia 08913, Spain
| | - Albert Roso-Llorach
- Jordi Gol University Institute for Research Primary Healthcare (IDIAP Jordi Gol), Barcelona, Catalonia 08007, Spain; Autonomous University of Barcelona, Bellaterra, Catalonia 08913, Spain
| | - Concepció Violan
- Jordi Gol University Institute for Research Primary Healthcare (IDIAP Jordi Gol), Barcelona, Catalonia 08007, Spain; Autonomous University of Barcelona, Bellaterra, Catalonia 08913, Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Bioinformatics in Rare Diseases (BiER), CIBERER, INB-ELIXIR-es, Hospital Virgen del Rocío, Seville 41013, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Bioinformatics in Rare Diseases (BiER), CIBERER, INB-ELIXIR-es, Hospital Virgen del Rocío, Seville 41013, Spain
| | - Ana Isabel Extremera
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Mar García-Valero
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Carmen Herranz
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Francesca Mateo
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain
| | - Elisabetta Mereu
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
| | - Jonathan Beesley
- Cancer Division, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | | | - Cecilia Roux
- Princess Margaret Cancer Centre, The Campbell Family Institute for Breast Cancer Research, Ontario Cancer Institute, University Health Network, Toronto, ON M5G 2M9, Canada
| | - Tak Mak
- Princess Margaret Cancer Centre, The Campbell Family Institute for Breast Cancer Research, Ontario Cancer Institute, University Health Network, Toronto, ON M5G 2M9, Canada
| | - Joan Brunet
- Hereditary Cancer Program, Catalan Institute of Oncology, Biomedical Research Institute of Girona (IDIBGI), Girona, Catalonia 17190, Spain
| | - Razq Hakem
- Princess Margaret Cancer Centre, Department of Medical Biophysics, University Health Network and University of Toronto, Toronto, ON M5G 2C1, Canada
| | - Chiara Gorrini
- Princess Margaret Cancer Centre, The Campbell Family Institute for Breast Cancer Research, Ontario Cancer Institute, University Health Network, Toronto, ON M5G 2M9, Canada
| | - Antonis C Antoniou
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK.
| | - Conxi Lázaro
- Hereditary Cancer Program, Catalan Institute of Oncology, Oncobell, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, and Spanish Biomedical Research Network Centre in Oncology (CIBERONC), Instituto de Salud Carlos III, Madrid 28029, Spain.
| | - Miquel Angel Pujana
- ProCURE, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Catalonia 08908, Spain.
| |
Collapse
|
125
|
Menden K, Marouf M, Oller S, Dalmia A, Magruder DS, Kloiber K, Heutink P, Bonn S. Deep learning-based cell composition analysis from tissue expression profiles. SCIENCE ADVANCES 2020; 6:eaba2619. [PMID: 32832661 PMCID: PMC7439569 DOI: 10.1126/sciadv.aba2619] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 06/05/2020] [Indexed: 05/28/2023]
Abstract
We present Scaden, a deep neural network for cell deconvolution that uses gene expression information to infer the cellular composition of tissues. Scaden is trained on single-cell RNA sequencing (RNA-seq) data to engineer discriminative features that confer robustness to bias and noise, making complex data preprocessing and feature selection unnecessary. We demonstrate that Scaden outperforms existing deconvolution algorithms in both precision and robustness. A single trained network reliably deconvolves bulk RNA-seq and microarray, human and mouse tissue expression data and leverages the combined information of multiple datasets. Because of this stability and flexibility, we surmise that deep learning will become an algorithmic mainstay for cell deconvolution of various data types. Scaden's software package and web application are easy to use on new as well as diverse existing expression datasets available in public resources, deepening the molecular and cellular understanding of developmental and disease processes.
Collapse
Affiliation(s)
- Kevin Menden
- German Center for Neurodegenerative Diseases, Tuebingen, Germany
| | - Mohamed Marouf
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sergio Oller
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Anupriya Dalmia
- German Center for Neurodegenerative Diseases, Tuebingen, Germany
| | - Daniel Sumner Magruder
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Genevention GmbH, Goettingen, Germany
| | - Karin Kloiber
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Peter Heutink
- German Center for Neurodegenerative Diseases, Tuebingen, Germany
| | - Stefan Bonn
- German Center for Neurodegenerative Diseases, Tuebingen, Germany
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
126
|
Abstract
The term axial spondyloarthritis (axSpA) encompasses a heterogeneous group of diseases that have variable presentations, extra-articular manifestations and clinical outcomes, and that will respond differently to treatments. The prototypical type of axSpA, ankylosing spondylitis, is thought to be caused by interaction between the genetically primed host immune system and gut microbiota. Currently used biomarkers such as HLA-B27 status, C-reactive protein and erythrocyte sedimentation rate have, at best, moderate diagnostic and predictive value. Improved biomarkers are needed for axSpA to assist with early diagnosis and to better predict treatment responses and long-term outcomes. Advances in a range of 'omics' technologies and statistical approaches, including genomics approaches (such as polygenic risk scores), microbiome profiling and, potentially, transcriptomic, proteomic and metabolomic profiling, are making it possible for more informative biomarker sets to be developed for use in such clinical applications. Future developments in this field will probably involve combinations of biomarkers that require novel statistical approaches to analyse and to produce easy to interpret metrics for clinical application. Large publicly available datasets from well-characterized case-cohort studies that use extensive biological sampling, particularly focusing on early disease and responses to medications, are required to establish successful biomarker discovery and validation programmes.
Collapse
|
127
|
Cantini L, Kairov U, de Reyniès A, Barillot E, Radvanyi F, Zinovyev A. Assessing reproducibility of matrix factorization methods in independent transcriptomes. Bioinformatics 2020; 35:4307-4313. [PMID: 30938767 PMCID: PMC6821374 DOI: 10.1093/bioinformatics/btz225] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 03/20/2019] [Accepted: 04/01/2019] [Indexed: 12/26/2022] Open
Abstract
Motivation Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others. Results We systematically test widely used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of independent component analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping. Availability and implementation The RBH construction tool is available from http://goo.gl/DzpwYp Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura Cantini
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Computational Systems Biology Team, Institut de Biologie de l'École Normale Supérieure, CNRS UMR8197, INSERM U1024, École Normale Supérieure, PSL Research University, Paris, France
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France
| | - François Radvanyi
- Institut Curie, PSL Research University, CNRS, UMR144, Equipe Labellisée Ligue Contre le Cancer, Paris, France.,Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR144, Paris
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Lobachevsky University, Nizhny Novgorod, Russia
| |
Collapse
|
128
|
Sturm G, Finotello F, Petitprez F, Zhang JD, Baumbach J, Fridman WH, List M, Aneichyk T. Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinformatics 2020; 35:i436-i445. [PMID: 31510660 PMCID: PMC6612828 DOI: 10.1093/bioinformatics/btz363] [Citation(s) in RCA: 597] [Impact Index Per Article: 119.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing. RESULTS We developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11 000 cells from the TME to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures. AVAILABILITY AND IMPLEMENTATION A snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gregor Sturm
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Pieris Pharmaceuticals GmbH, Freising, Germany
| | - Francesca Finotello
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innsbruck, Austria
| | - Florent Petitprez
- Cordeliers Research Centre, UMRS_1138, INSERM, University Paris-Descartes, Sorbonne University, Paris, France.,Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
| | - Jitao David Zhang
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Wolf H Fridman
- Cordeliers Research Centre, UMRS_1138, INSERM, University Paris-Descartes, Sorbonne University, Paris, France
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatis, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Tatsiana Aneichyk
- Pieris Pharmaceuticals GmbH, Freising, Germany.,Independent Data Lab UG, Munich, Germany
| |
Collapse
|
129
|
Cell Type-Specific In Vitro Gene Expression Profiling of Stem Cell-Derived Neural Models. Cells 2020; 9:cells9061406. [PMID: 32516938 PMCID: PMC7349756 DOI: 10.3390/cells9061406] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 05/29/2020] [Accepted: 06/02/2020] [Indexed: 12/13/2022] Open
Abstract
Genetic and genomic studies of brain disease increasingly demonstrate disease-associated interactions between the cell types of the brain. Increasingly complex and more physiologically relevant human-induced pluripotent stem cell (hiPSC)-based models better explore the molecular mechanisms underlying disease but also challenge our ability to resolve cell type-specific perturbations. Here, we report an extension of the RiboTag system, first developed to achieve cell type-restricted expression of epitope-tagged ribosomal protein (RPL22) in mouse tissue, to a variety of in vitro applications, including immortalized cell lines, primary mouse astrocytes, and hiPSC-derived neurons. RiboTag expression enables depletion of up to 87 percent of off-target RNA in mixed species co-cultures. Nonetheless, depletion efficiency varies across independent experimental replicates, particularly for hiPSC-derived motor neurons. The challenges and potential of implementing RiboTags in complex in vitro cultures are discussed.
Collapse
|
130
|
Bortolomeazzi M, Keddar MR, Ciccarelli FD, Benedetti L. Identification of non-cancer cells from cancer transcriptomic data. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194445. [PMID: 31654804 PMCID: PMC7346884 DOI: 10.1016/j.bbagrm.2019.194445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/20/2019] [Accepted: 10/07/2019] [Indexed: 02/07/2023]
Abstract
Interactions between cancer cells and non-cancer cells composing the tumour microenvironment play a primary role in determining cancer progression and shaping the response to therapy. The qualitative and quantitative characterisation of the different cell populations in the tumour microenvironment is therefore crucial to understand its role in cancer. In recent years, many experimental and computational approaches have been developed to identify the cell populations composing heterogeneous tissue samples, such as cancer. In this review, we describe the state-of-the-art approaches for the quantification of non-cancer cells from bulk and single-cell cancer transcriptomic data, with a focus on immune cells. We illustrate the main features of these approaches and highlight their applications for the analysis of the tumour microenvironment in solid cancers. We also discuss techniques that are complementary and alternative to RNA sequencing, particularly focusing on approaches that can provide spatial information on the distribution of the cells within the tumour in addition to their qualitative and quantitative measurements. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Michele Bortolomeazzi
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK
| | - Mohamed Reda Keddar
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK
| | - Francesca D Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK.
| | - Lorena Benedetti
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK.
| |
Collapse
|
131
|
Abstract
The remarkable success of cancer immunotherapies, especially the checkpoint blocking antibodies, in a subset of patients has reinvigorated the study of tumor-immune crosstalk and its role in heterogeneity of response. High-throughput sequencing and imaging technologies can help recapitulate various aspects of the tumor ecosystem. Computational approaches provide an arsenal of tools to efficiently analyze, quantify and integrate multiple parameters of tumor immunity mined from these diverse but complementary high-throughput datasets. This chapter describes numerous such computational approaches in tumor immunology that leverage high-throughput data from diverse sources (genomic, transcriptomics, epigenomics and digitized histopathology images) to systematically interrogate tumor immunity in context of its microenvironment, and to identify mechanisms that confer resistance or sensitivity to cancer therapies, in particular immunotherapy.
Collapse
|
132
|
Schön M, Simeth J, Heinrich P, Görtler F, Solbrig S, Wettig T, Oefner PJ, Altenbuchinger M, Spang R. DTD: An R Package for Digital Tissue Deconvolution. J Comput Biol 2020; 27:386-389. [PMID: 31995409 PMCID: PMC7074920 DOI: 10.1089/cmb.2019.0469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Digital tissue deconvolution (DTD) estimates the cellular composition of a tissue from its bulk gene-expression profile. For this, DTD approximates the bulk as a mixture of cell-specific expression profiles. Different tissues have different cellular compositions, with cells in different activation states, and embedded in different environments. Consequently, DTD can profit from tailoring the deconvolution model to a specific tissue context. Loss-function learning adapts DTD to a specific tissue context, such as the deconvolution of blood, or a specific type of tumor tissue. We provide software for loss-function learning, for its validation and visualization, and for applying the DTD models to new data.
Collapse
Affiliation(s)
- Marian Schön
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Jakob Simeth
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Paul Heinrich
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Franziska Görtler
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Peter J. Oefner
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Michael Altenbuchinger
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| |
Collapse
|
133
|
Sturm G, Finotello F, List M. In Silico Cell-Type Deconvolution Methods in Cancer Immunotherapy. Methods Mol Biol 2020; 2120:213-222. [PMID: 32124322 DOI: 10.1007/978-1-0716-0327-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2024]
Abstract
Several computational methods have been proposed to infer the cellular composition from bulk RNA-seq data of a tumor biopsy sample. Elucidating interactions in the tumor microenvironment can yield unique insights into the status of the immune system. In immuno-oncology, this information can be crucial for deciding whether the immune system of a patient can be stimulated to target the tumor. Here, we shed a light on the working principles, capabilities, and limitations of the most commonly used methods for cell-type deconvolution in immuno-oncology and offer guidelines for method selection.
Collapse
Affiliation(s)
- Gregor Sturm
- Biocenter, Institute of Bioinformatics, Medical University of Innsbruck, Innsbruck, Austria
| | - Francesca Finotello
- Biocenter, Institute of Bioinformatics, Medical University of Innsbruck, Innsbruck, Austria
| | - Markus List
- Big Data in BioMedicine Group, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
134
|
Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019; 15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open
Abstract
Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq. Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| | - Qian Meng
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - David M. Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Melissa Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| |
Collapse
|
135
|
Jo SY, Kim E, Kim S. Impact of mouse contamination in genomic profiling of patient-derived models and best practice for robust analysis. Genome Biol 2019; 20:231. [PMID: 31707992 PMCID: PMC6844030 DOI: 10.1186/s13059-019-1849-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 10/02/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Patient-derived xenograft and cell line models are popular models for clinical cancer research. However, the inevitable inclusion of a mouse genome in a patient-derived model is a remaining concern in the analysis. Although multiple tools and filtering strategies have been developed to account for this, research has yet to demonstrate the exact impact of the mouse genome and the optimal use of these tools and filtering strategies in an analysis pipeline. RESULTS We construct a benchmark dataset of 5 liver tissues from 3 mouse strains using human whole-exome sequencing kit. Next-generation sequencing reads from mouse tissues are mappable to 49% of the human genome and 409 cancer genes. In total, 1,207,556 mouse-specific alleles are aligned to the human genome reference, including 467,232 (38.7%) alleles with high sensitivity to contamination, which are pervasive causes of false cancer mutations in public databases and are signatures for predicting global contamination. Next, we assess the performance of 8 filtering methods in terms of mouse read filtration and reduction of mouse-specific alleles. All filtering tools generally perform well, although differences in algorithm strictness and efficiency of mouse allele removal are observed. Therefore, we develop a best practice pipeline that contains the estimation of contamination level, mouse read filtration, and variant filtration. CONCLUSIONS The inclusion of mouse cells in patient-derived models hinders genomic analysis and should be addressed carefully. Our suggested guidelines improve the robustness and maximize the utility of genomic analysis of these models.
Collapse
Affiliation(s)
- Se-Young Jo
- Department of Biomedical Systems Informatics and Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, 03722, South Korea
| | - Eunyoung Kim
- Department of Biomedical Systems Informatics and Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, 03722, South Korea
| | - Sangwoo Kim
- Department of Biomedical Systems Informatics and Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, 03722, South Korea.
| |
Collapse
|
136
|
Chevrier N. Decoding the Body Language of Immunity: Tackling the Immune System at the Organism Level. ACTA ACUST UNITED AC 2019; 18:19-26. [PMID: 32490290 DOI: 10.1016/j.coisb.2019.10.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The immune system is a dynamic mesh of molecules, cells and tissues spanning the entire organism. Despite a wealth of knowledge about the components of the immune system, little is known about the general rules governing the organismal circuitry of immunity. Deciphering the immune system at the scale of the whole organism is crucial to understanding fundamental problems in immunobiology and physiology, and to manipulate immunity for maintaining health and preventing disease. Here I discuss the emerging principles of inter-organ communications during immune responses by focusing on three common themes that are the regulation of the (i) composition, (ii) condition and (iii) coordination of communicating organs by molecular and cellular factors. Based on these common principles, I emphasize fundamental gaps in our knowledge of organismal immune processes and the outlook to tackle immunity at the scale of the whole organism.
Collapse
Affiliation(s)
- Nicolas Chevrier
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
137
|
Sompairac N, Nazarov PV, Czerwinska U, Cantini L, Biton A, Molkenov A, Zhumadilov Z, Barillot E, Radvanyi F, Gorban A, Kairov U, Zinovyev A. Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets. Int J Mol Sci 2019; 20:E4414. [PMID: 31500324 PMCID: PMC6771121 DOI: 10.3390/ijms20184414] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Revised: 09/02/2019] [Accepted: 09/04/2019] [Indexed: 12/13/2022] Open
Abstract
Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets.
Collapse
Affiliation(s)
- Nicolas Sompairac
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
- Centre de Recherches Interdisciplinaires, Université Paris Descartes, 75004 Paris, France.
| | - Petr V Nazarov
- Multiomics Data Science Research Group, Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg.
| | - Urszula Czerwinska
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| | - Laura Cantini
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197, INSERM U1024, Ecole Normale Supérieure, PSL Research University, 75005 Paris, France.
| | - Anne Biton
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 75015 Paris, France.
| | - Askhat Molkenov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Zhaxybay Zhumadilov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
- University Medical Center, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| | - Francois Radvanyi
- Institut Curie, PSL Research University, 75005 Paris, France.
- CNRS, UMR 144, 75248 Paris, France.
| | - Alexander Gorban
- Center for Mathematical Modeling, University of Leicester, Leicester LE1 7RH, UK.
- Lobachevsky University, 603022 Nizhny Novgorod, Russia.
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| |
Collapse
|
138
|
Accurate estimation of cell-type composition from gene expression data. Nat Commun 2019; 10:2975. [PMID: 31278265 PMCID: PMC6611906 DOI: 10.1038/s41467-019-10802-z] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 05/24/2019] [Indexed: 01/20/2023] Open
Abstract
The rapid development of single-cell transcriptomic technologies has helped uncover the cellular heterogeneity within cell populations. However, bulk RNA-seq continues to be the main workhorse for quantifying gene expression levels due to technical simplicity and low cost. To most effectively extract information from bulk data given the new knowledge gained from single-cell methods, we have developed a novel algorithm to estimate the cell-type composition of bulk data from a single-cell RNA-seq-derived cell-type signature. Comparison with existing methods using various real RNA-seq data sets indicates that our new approach is more accurate and comprehensive than previous methods, especially for the estimation of rare cell types. More importantly, our method can detect cell-type composition changes in response to external perturbations, thereby providing a valuable, cost-effective method for dissecting the cell-type-specific effects of drug treatments or condition changes. As such, our method is applicable to a wide range of biological and clinical investigations. Bulk RNA-seq data harbors valuable information about gene expression levels from different cell types in tissue samples. Here, the authors develop DWLS, a computational method for estimating cell-type composition of bulk data by leveraging single-cell RNA-seq-derived cell-type signatures.
Collapse
|
139
|
Woo J, Winterhoff BJ, Starr TK, Aliferis C, Wang J. De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments. Life Sci Alliance 2019; 2:2/4/e201900443. [PMID: 31266885 PMCID: PMC6607449 DOI: 10.26508/lsa.201900443] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 06/24/2019] [Indexed: 12/30/2022] Open
Abstract
This study describes a computational method for determining statistical support to varying levels of heterogeneity provided by single-cell RNA-sequencing data with applications to tumor samples. Recent single-cell transcriptomic studies revealed new insights into cell-type heterogeneities in cellular microenvironments unavailable from bulk studies. A significant drawback of currently available algorithms is the need to use empirical parameters or rely on indirect quality measures to estimate the degree of complexity, i.e., the number of subgroups present in the sample. We fill this gap with a single-cell data analysis procedure allowing for unambiguous assessments of the depth of heterogeneity in subclonal compositions supported by data. Our approach combines nonnegative matrix factorization, which takes advantage of the sparse and nonnegative nature of single-cell RNA count data, with Bayesian model comparison enabling de novo prediction of the depth of heterogeneity. We show that the method predicts the correct number of subgroups using simulated data, primary blood mononuclear cell, and pancreatic cell data. We applied our approach to a collection of single-cell tumor samples and found two qualitatively distinct classes of cell-type heterogeneity in cancer microenvironments.
Collapse
Affiliation(s)
- Jun Woo
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - Boris J Winterhoff
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA.,Department of Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, MN, USA
| | - Timothy K Starr
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA.,Department of Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, MN, USA
| | - Constantin Aliferis
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Jinhua Wang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA .,Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
140
|
Wang W, Wang L, Gulko PS, Zhu J. Computational deconvolution of synovial tissue cellular composition: presence of adipocytes in synovial tissue decreased during arthritis pathogenesis and progression. Physiol Genomics 2019; 51:241-253. [PMID: 31100034 PMCID: PMC6620645 DOI: 10.1152/physiolgenomics.00009.2019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/18/2019] [Accepted: 05/13/2019] [Indexed: 01/15/2023] Open
Abstract
Osteoarthritis (OA) and rheumatoid arthritis (RA) are the most common forms of arthritis. The synovial tissue is the major site of inflammation of OA and RA and consists of diverse cells. Synovial tissue cell composition changes during arthritis pathogenesis and progression have not been systematically characterized and may provide critical insights into disease processes. In this study we aimed at systematically examining cellular changes in synovial tissue. Publicly available synovial tissue transcriptomic data sets were used. We computationally estimated cell compositions in synovial tissue based on transcriptomic data and compared cell compositions in different diseases or at different disease stages. Synovial fibroblasts, macrophages, adipocytes, and immune cells were the major cell types in all synovial tissue. Both OA and RA patients had a significantly lower adipocyte fraction compared with healthy controls. The decrease trend was also observed during OA and RA progression. The fraction of monocytes was also increased in both OA and RA arthritis patients, consistent with the observations that inflammation involved in both OA and RA. But the monocyte fraction in RAs was much higher than the ones in healthy controls and OAs. The M2 macrophage fraction was reduced in RA compared with OA, the reduction trend continued during RA progression from the early- to the late-stage. There were consistent cell composition differences between different types or stages of arthritis. Both in RA and OA, the new discovery of changes in the adipocyte and M2 macrophage fractions has potential leading to novel therapeutic development.
Collapse
Affiliation(s)
- Wenhui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai , New York, New York
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , New York, New York
| | - Li Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai , New York, New York
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , New York, New York
- Sema4, a Mount Sinai venture, Stamford, Connecticut
| | - Percio S Gulko
- Division of Rheumatology, Department of Medicine, Icahn School of Medicine at Mount Sinai , New York
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai , New York, New York
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , New York, New York
- Sema4, a Mount Sinai venture, Stamford, Connecticut
| |
Collapse
|
141
|
Kong Y, Rastogi D, Seoighe C, Greally JM, Suzuki M. Insights from deconvolution of cell subtype proportions enhance the interpretation of functional genomic data. PLoS One 2019; 14:e0215987. [PMID: 31022271 PMCID: PMC6483354 DOI: 10.1371/journal.pone.0215987] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 04/11/2019] [Indexed: 02/07/2023] Open
Abstract
Cell subtype proportion variability between samples contributes significantly to the variation of functional genomic properties such as gene expression or DNA methylation. Although the impact of the variation of cell subtype composition on measured genomic quantities is recognized, and some innovative tools have been developed for the analysis of heterogeneous samples, most functional genomics studies using samples with mixed cell types still ignore the influence of cell subtype proportion variation, or just deal with it as a nuisance variable to be eliminated. Here we demonstrate how harvesting information about cell subtype proportions from functional genomics data can provide insights into cellular changes associated with phenotypes. We focused on two types of mixed cell populations, human blood and mouse kidney. Cell type prediction is well developed in the former, but not currently in the latter. Estimating the cellular repertoire is easier when a reference dataset from purified samples of all cell types in the tissue is available, as is the case for blood. However, reference datasets are not available for most other tissues, such as the kidney. In this study, we showed that the proportion of alterations attributable to changes in the cellular composition varies strikingly in the two disorders (asthma and systemic lupus erythematosus), suggesting that the contribution of cell subtype proportion changes to functional genomic properties can be disease-specific. We also showed that a reference dataset from a single-cell RNA-seq study successfully estimated the cell subtype proportions in mouse kidney and allowed us to distinguish altered cell subtype differences between two different knock-out mouse models, both of which had reported a reduced number of glomeruli compared to their wild-type counterparts. These findings demonstrate that testing for changes in cell subtype proportions between conditions can yield important insights in functional genomics studies.
Collapse
Affiliation(s)
- Yu Kong
- Department of Genetics and Center for Epigenomics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Deepa Rastogi
- Department of Pediatrics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, University Road, Galway, Ireland
| | - John M. Greally
- Department of Genetics and Center for Epigenomics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Masako Suzuki
- Department of Genetics and Center for Epigenomics, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|
142
|
Frishberg A, Peshes-Yaloz N, Cohn O, Rosentul D, Steuerman Y, Valadarsky L, Yankovitz G, Mandelboim M, Iraqi FA, Amit I, Mayo L, Bacharach E, Gat-Viks I. Cell composition analysis of bulk genomics using single-cell data. Nat Methods 2019; 16:327-332. [PMID: 30886410 PMCID: PMC6443043 DOI: 10.1038/s41592-019-0355-5] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2018] [Accepted: 02/12/2019] [Indexed: 12/21/2022]
Abstract
Single-cell expression profiling (scRNA-seq) is a rich resource of cellular heterogeneity. While profiling every sample under study would be advantageous, it is time-consuming and costly. Here we introduce Cell Population Mapping (CPM), a deconvolution algorithm in which the composition of cell types and states is inferred from the bulk transcriptome using reference scRNA-seq profiles ('scBio' CRAN R-package). Analysis of individual variations in lungs of influenza virus-infected mice, using CPM, revealed that the relationship between cell abundance and clinical symptoms is a cell-state-specific property that varies gradually along the continuum of cell-activation states. The gradual change was confirmed in subsequent experiments and was further explained by a mathematical model in which clinical outcomes relate to cell-state dynamics along the activation process. Our results demonstrate the power of CPM in reconstructing the continuous spectrum of cell states within heterogeneous tissues.
Collapse
Affiliation(s)
- Amit Frishberg
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Naama Peshes-Yaloz
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ofir Cohn
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Diana Rosentul
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yael Steuerman
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Liran Valadarsky
- Department of Immunology, The Weizmann Institute of Science, Rehovot, Israel
| | - Gal Yankovitz
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michal Mandelboim
- National Center for Influenza and Respiratory Viruses, Central Virology Laboratory, Sheba Medical Center at Tel HaShomer, Ramat-Gan, Israel.,Department of Epidemiology and Preventive Medicine, School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Fuad A Iraqi
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ido Amit
- Department of Immunology, The Weizmann Institute of Science, Rehovot, Israel
| | - Lior Mayo
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.,Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Eran Bacharach
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Irit Gat-Viks
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
143
|
Wang X, Park J, Susztak K, Zhang NR, Li M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun 2019; 10:380. [PMID: 30670690 PMCID: PMC6342984 DOI: 10.1038/s41467-018-08023-x] [Citation(s) in RCA: 490] [Impact Index Per Article: 81.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 12/02/2018] [Indexed: 12/16/2022] Open
Abstract
Knowledge of cell type composition in disease relevant tissues is an important step towards the identification of cellular targets of disease. We present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. By appropriate weighting of genes showing cross-subject and cross-cell consistency, MuSiC enables the transfer of cell type-specific gene expression information from one dataset to another. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables the characterization of cellular heterogeneity of complex tissues for understanding of disease mechanisms. As bulk tissue data are more easily accessible than single-cell RNA-seq, MuSiC allows the utilization of the vast amounts of disease relevant bulk tissue RNA-seq data for elucidating cell type contributions in disease. Bulk tissue RNA-seq data reveals transcriptomic profiles but masks the contributions of different cell types. Here, the authors develop a new method for estimating cell type proportions from bulk tissue RNA-seq data guided by multi-subject single-cell expression reference.
Collapse
Affiliation(s)
- Xuran Wang
- Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jihwan Park
- Departments of Medicine and Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Katalin Susztak
- Departments of Medicine and Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Mingyao Li
- Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
144
|
Manoharan M, Mandloi N, Priyadarshini S, Patil A, Gupta R, Iyer L, Gupta R, Chaudhuri A. A Computational Approach Identifies Immunogenic Features of Prognosis in Human Cancers. Front Immunol 2018; 9:3017. [PMID: 30622534 PMCID: PMC6308325 DOI: 10.3389/fimmu.2018.03017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Accepted: 12/06/2018] [Indexed: 12/25/2022] Open
Abstract
A large number of tumor intrinsic and extrinsic factors determine long-term survival in human cancers. In this study, we stratified 9120 tumors from 33 cancers with respect to their immune cell content and identified immunogenomic features associated with long-term survival. Our analysis demonstrates that tumors infiltrated by CD8+ T cells expressing higher levels of activation marker (PD1hi) along with TCR signaling genes and cytolytic T cell markers (IL2hi/TNF-αhi/IFN-γhi/GZMA-Bhi) extend survival, whereas survival benefit was absent for tumors infiltrated by anergic and hyperexhausted CD8+ T cells characterized by high expression of CTLA-4, TIM3, LAG3, and genes linked to PI3K signaling pathway. The computational approach of using robust and highly specific gene expression signatures to deconvolute the tumor microenvironment has important clinical applications, such as selecting patients who will benefit from checkpoint inhibitor treatment.
Collapse
|
145
|
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018; 19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes. RESULTS We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA). CONCLUSION Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).
Collapse
Affiliation(s)
- Konstantina Dimitrakopoulou
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Lars A Akslen
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway. .,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
146
|
xMD-miRNA-seq to generate near in vivo miRNA expression estimates in colon epithelial cells. Sci Rep 2018; 8:9783. [PMID: 29955168 PMCID: PMC6023933 DOI: 10.1038/s41598-018-28198-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 06/19/2018] [Indexed: 01/09/2023] Open
Abstract
Accurate, RNA-seq based, microRNA (miRNA) expression estimates from primary cells have recently been described. However, this in vitro data is mainly obtained from cell culture, which is known to alter cell maturity/differentiation status, significantly changing miRNA levels. What is needed is a robust method to obtain in vivo miRNA expression values directly from cells. We introduce expression microdissection miRNA small RNA sequencing (xMD-miRNA-seq), a method to isolate cells directly from formalin fixed paraffin-embedded (FFPE) tissues. xMD-miRNA-seq is a low-cost, high-throughput, immunohistochemistry-based method to capture any cell type of interest. As a proof-of-concept, we isolated colon epithelial cells from two specimens and performed low-input small RNA-seq. We generated up to 600,000 miRNA reads from the samples. Isolated epithelial cells, had abundant epithelial-enriched miRNA expression (miR-192; miR-194; miR-200b; miR-200c; miR-215; miR-375) and overall similar miRNA expression patterns to other epithelial cell populations (colonic enteroids and flow-isolated colon epithelium). xMD-derived epithelial cells were generally not contaminated by other adjacent cells of the colon as noted by t-SNE analysis. xMD-miRNA-seq allows for simple, economical, and efficient identification of cell-specific miRNA expression estimates. Further development will enhance rapid identification of cell-specific miRNA expression estimates in health and disease for nearly any cell type using archival FFPE material.
Collapse
|