1
|
Huuki-Myers LA, Montgomery KD, Kwon SH, Cinquemani S, Eagles NJ, Gonzalez-Padilla D, Maden SK, Kleinman JE, Hyde TM, Hicks SC, Maynard KR, Collado-Torres L. Benchmark of cellular deconvolution methods using a multi-assay dataset from postmortem human prefrontal cortex. Genome Biol 2025; 26:88. [PMID: 40197307 PMCID: PMC11978107 DOI: 10.1186/s13059-025-03552-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 03/21/2025] [Indexed: 04/10/2025] Open
Abstract
Cellular deconvolution of bulk RNA-sequencing data using single cell/nuclei RNA-seq reference data is an important strategy for estimating cell type composition in heterogeneous tissues, such as the human brain. Here, we generate a multi-assay dataset in postmortem human dorsolateral prefrontal cortex from 22 tissue blocks, including bulk RNA-seq, reference snRNA-seq, and orthogonal measurement of cell type proportions with RNAScope/ImmunoFluorescence. We use this dataset to evaluate six deconvolution algorithms. Bisque and hspe were the most accurate methods. The dataset, as well as the Mean Ratio gene marker finding method, is made available in the DeconvoBuddies R/Bioconductor package.
Collapse
Affiliation(s)
- Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- UK Dementia Research Institute at the University of Cambridge, Cambridge, UK
- Department of Clinical Neurosciences, School of Clinical Medicine, The University of Cambridge, Cambridge, UK
| | - Kelsey D Montgomery
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Sophia Cinquemani
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Nicholas J Eagles
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | | | - Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21205, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
2
|
Lin WY, Kartawinata M, Jebson BR, Restuadi R, Peckham H, Radziszewska A, Deakin CT, Ciurtin C, CLUSTER Consortium, Wedderburn LR, Wallace C. Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods. PLoS Comput Biol 2025; 21:e1012859. [PMID: 40053530 PMCID: PMC11957391 DOI: 10.1371/journal.pcbi.1012859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 03/31/2025] [Accepted: 02/07/2025] [Indexed: 03/09/2025] Open
Abstract
Gene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be used to estimate cell fractions from bulk expression data and infer average cell-type expression in a set of samples (e.g., cases or controls), but imputing sample-level cell-type expression is required for more detailed analyses, such as relating expression to quantitative traits, and is less commonly addressed. Here, we assessed the accuracy of imputing sample-level cell-type expression using a real dataset where mixed peripheral blood mononuclear cells (PBMC) and sorted (CD4, CD8, CD14, CD19) RNA sequencing data were generated from the same subjects (N=158), and pseudobulk datasets synthesised from eQTLgen single cell RNA-seq data. We compared three domain-specific methods, CIBERSORTx, bMIND and debCAM/swCAM, and two cross-domain machine learning methods, multiple response LASSO and ridge, that had not been used for this task before. We also assessed the methods according to their ability to recover differential gene expression (DGE) results. LASSO/ridge showed higher sensitivity but lower specificity for recovering DGE signals seen in observed data compared to deconvolution methods, although LASSO/ridge had higher area under curves than deconvolution methods. Machine learning methods have the potential to outperform domain-specific methods when suitable training data are available.
Collapse
Affiliation(s)
- Wei-Yu Lin
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Melissa Kartawinata
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
| | - Bethany R. Jebson
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
| | - Restuadi Restuadi
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
| | - Hannah Peckham
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- Division of Medicine, Department of Ageing, Rheumatology & Regenerative Medicine, UCL, London, United Kingdom
| | - Anna Radziszewska
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- Division of Medicine, Department of Ageing, Rheumatology & Regenerative Medicine, UCL, London, United Kingdom
| | - Claire T. Deakin
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- National Institute for Health Research (NIHR) GOSH Biomedical Research Centre, London, United Kingdom
| | - Coziana Ciurtin
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- Division of Medicine, Department of Ageing, Rheumatology & Regenerative Medicine, UCL, London, United Kingdom
| | | | - Lucy R. Wedderburn
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- National Institute for Health Research (NIHR) GOSH Biomedical Research Centre, London, United Kingdom
| | - Chris Wallace
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
3
|
Dai R, Chu T, Zhang M, Wang X, Jourdon A, Wu F, Mariani J, Vaccarino FM, Lee D, Fullard JF, Hoffman GE, Roussos P, Wang Y, Wang X, Pinto D, Wang SH, Zhang C, PsychENCODE consortium, Chen C, Liu C. Evaluating performance and applications of sample-wise cell deconvolution methods on human brain transcriptomic data. SCIENCE ADVANCES 2024; 10:eadh2588. [PMID: 38781336 PMCID: PMC11114236 DOI: 10.1126/sciadv.adh2588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 01/05/2024] [Indexed: 05/25/2024]
Abstract
Sample-wise deconvolution methods estimate cell-type proportions and gene expressions in bulk tissue samples, yet their performance and biological applications remain unexplored, particularly in human brain transcriptomic data. Here, nine deconvolution methods were evaluated with sample-matched data from bulk tissue RNA sequencing (RNA-seq), single-cell/nuclei (sc/sn) RNA-seq, and immunohistochemistry. A total of 1,130,767 nuclei per cells from 149 adult postmortem brains and 72 organoid samples were used. The results showed the best performance of dtangle for estimating cell proportions and bMIND for estimating sample-wise cell-type gene expressions. For eight brain cell types, 25,273 cell-type eQTLs were identified with deconvoluted expressions (decon-eQTLs). The results showed that decon-eQTLs explained more schizophrenia GWAS heritability than bulk tissue or single-cell eQTLs did alone. Differential gene expressions associated with Alzheimer's disease, schizophrenia, and brain development were also examined using the deconvoluted data. Our findings, which were replicated in bulk tissue and single-cell data, provided insights into the biological applications of deconvoluted data in multiple brain disorders.
Collapse
Affiliation(s)
- Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Tianyao Chu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Ming Zhang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Xuan Wang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | | | - Feinan Wu
- Child Study Center, Yale University, New Haven, CT, USA
| | | | - Flora M. Vaccarino
- Child Study Center, Yale University, New Haven, CT, USA
- Department of Neuroscience, Yale University, New Haven, CT, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F. Fullard
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gabriel E. Hoffman
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Xusheng Wang
- Department of Biology, University of North Dakota, Grand Forks, ND, USA
| | - Dalila Pinto
- Departments of Psychiatry and Genetics and Genomic Sciences, Mindich Child Health and Development Institute, and Icahn Genomics Institute for Data Science and Genomic Technology, Seaver Autism Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sidney H. Wang
- Center for Human Genetics, The Brown foundation Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Chunling Zhang
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| | | | - Chao Chen
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| |
Collapse
|
4
|
Huuki-Myers LA, Montgomery KD, Kwon SH, Cinquemani S, Eagles NJ, Gonzalez-Padilla D, Maden SK, Kleinman JE, Hyde TM, Hicks SC, Maynard KR, Collado-Torres L. Benchmark of cellular deconvolution methods using a multi-assay reference dataset from postmortem human prefrontal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.09.579665. [PMID: 38405805 PMCID: PMC10888823 DOI: 10.1101/2024.02.09.579665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Background Cellular deconvolution of bulk RNA-sequencing (RNA-seq) data using single cell or nuclei RNA-seq (sc/snRNA-seq) reference data is an important strategy for estimating cell type composition in heterogeneous tissues, such as human brain. Computational methods for deconvolution have been developed and benchmarked against simulated data, pseudobulked sc/snRNA-seq data, or immunohistochemistry reference data. A major limitation in developing improved deconvolution algorithms has been the lack of integrated datasets with orthogonal measurements of gene expression and estimates of cell type proportions on the same tissue sample. Deconvolution algorithm performance has not yet been evaluated across different RNA extraction methods (cytosolic, nuclear, or whole cell RNA), different library preparation types (mRNA enrichment vs. ribosomal RNA depletion), or with matched single cell reference datasets. Results A rich multi-assay dataset was generated in postmortem human dorsolateral prefrontal cortex (DLPFC) from 22 tissue blocks. Assays included spatially-resolved transcriptomics, snRNA-seq, bulk RNA-seq (across six library/extraction RNA-seq combinations), and RNAScope/Immunofluorescence (RNAScope/IF) for six broad cell types. The Mean Ratio method, implemented in the DeconvoBuddies R package, was developed for selecting cell type marker genes. Six computational deconvolution algorithms were evaluated in DLPFC and predicted cell type proportions were compared to orthogonal RNAScope/IF measurements. Conclusions Bisque and hspe were the most accurate methods, were robust to differences in RNA library types and extractions. This multi-assay dataset showed that cell size differences, marker genes differentially quantified across RNA libraries, and cell composition variability in reference snRNA-seq impact the accuracy of current deconvolution methods.
Collapse
Affiliation(s)
- Louise A. Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Kelsey D. Montgomery
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Sophia Cinquemani
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Nicholas J. Eagles
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | | | - Sean K. Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Joel E. Kleinman
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Thomas M. Hyde
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21205, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Kristen R. Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
| |
Collapse
|
5
|
Uzuner D, İlgün A, Düz E, Bozkurt FB, Çakır T. Multilayer Analysis of RNA Sequencing Data in Alzheimer's Disease to Unravel Molecular Mysteries. ADVANCES IN NEUROBIOLOGY 2024; 41:219-246. [PMID: 39589716 DOI: 10.1007/978-3-031-69188-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Alzheimer's disease (AD) is a complex disease, and numerous cellular events may be involved in etiology. RNAseq-based transcriptome data hold multilayer information content, which could be crucial in unraveling molecular mysteries of AD. It enables quantification of gene expression levels, identification of genomic variants, and elucidation of splicing anomalies such as exon skipping and intron retention. Additional integration of this information into protein-protein interaction networks and genome-scale metabolic models from the literature has potential to decipher functional modules and affected mechanisms for complex scenarios such as AD. In this chapter, we review the application areas of the multilayer content of RNAseq and associated integrative approaches available, with a special focus on AD.
Collapse
Affiliation(s)
- Dilara Uzuner
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Atılay İlgün
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Elif Düz
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Fatma Betül Bozkurt
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Tunahan Çakır
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey.
| |
Collapse
|
6
|
Dai R, Chu T, Zhang M, Wang X, Jourdon A, Wu F, Mariani J, Vaccarino FM, Lee D, Fullard JF, Hoffman GE, Roussos P, Wang Y, Wang X, Pinto D, Wang SH, Zhang C, Chen C, Liu C. Evaluating performance and applications of sample-wise cell deconvolution methods on human brain transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532468. [PMID: 36993743 PMCID: PMC10054947 DOI: 10.1101/2023.03.13.532468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Sample-wise deconvolution methods have been developed to estimate cell-type proportions and gene expressions in bulk-tissue samples. However, the performance of these methods and their biological applications has not been evaluated, particularly on human brain transcriptomic data. Here, nine deconvolution methods were evaluated with sample-matched data from bulk-tissue RNAseq, single-cell/nuclei (sc/sn) RNAseq, and immunohistochemistry. A total of 1,130,767 nuclei/cells from 149 adult postmortem brains and 72 organoid samples were used. The results showed the best performance of dtangle for estimating cell proportions and bMIND for estimating sample-wise cell-type gene expression. For eight brain cell types, 25,273 cell-type eQTLs were identified with deconvoluted expressions (decon-eQTLs). The results showed that decon-eQTLs explained more schizophrenia GWAS heritability than bulk-tissue or single-cell eQTLs alone. Differential gene expression associated with multiple phenotypes were also examined using the deconvoluted data. Our findings, which were replicated in bulk-tissue RNAseq and sc/snRNAseq data, provided new insights into the biological applications of deconvoluted data.
Collapse
Affiliation(s)
- Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Tianyao Chu
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Ming Zhang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Xuan Wang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | | | - Feinan Wu
- Child Study Center, Yale University, New Haven, CT, USA
| | | | - Flora M Vaccarino
- Child Study Center, Yale University, New Haven, CT, USA
- Department of Neuroscience, Yale University, New Haven, CT, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, VA, USA
| | - Xusheng Wang
- Department of Biology, University of North Dakota, Grand Forks, ND, USA
| | - Dalila Pinto
- Department of Psychiatry, Department of Genetics and Genomic Sciences, Mindich Child Health and Development Institute, and Icahn Genomics Institute for Data Science and Genomic Technology, Seaver Autism Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sidney H Wang
- Center for Human Genetics, The Brown foundation Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Chunling Zhang
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Chao Chen
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| |
Collapse
|
7
|
Herrington D, Wang Y. CLINICAL HETEROGENEITY IN THE AGE OF BIG DATA, ADVANCED ANALYTICS, AND COMPLEXITY THEORY. TRANSACTIONS OF THE AMERICAN CLINICAL AND CLIMATOLOGICAL ASSOCIATION 2023; 133:56-68. [PMID: 37701617 PMCID: PMC10493739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Clinical heterogeneity remains a challenge in the practice of medicine and is an underlying motivation for much of biomedical research. Unfortunately, despite an abundance of technologies capable of producing millions of discrete data elements with information about a patient's health status or disease prognosis, our ability to translate those data into meaningful improvements in understanding of clinical heterogeneity is limited. To address this gap, we have applied newer approaches to manifold learning and developed additional and complementary techniques to interrogate and interpret complex, high dimensional omics data. The central premise is that there exist manifolds embedded in high dimensional data that represent fundamental biologic processes that may help address the challenges of clinical heterogeneity. Preliminary evidence from several real-world data sets suggests that these techniques can identify coherent and reproducible manifolds embedded in higher dimensional omics data. Work is currently ongoing to determine the clinical informativeness of these novel data structures.
Collapse
|
8
|
Yan L, Sun X. Benchmarking and integration of methods for deconvoluting spatial transcriptomic data. Bioinformatics 2022; 39:6900924. [PMID: 36515467 PMCID: PMC9825747 DOI: 10.1093/bioinformatics/btac805] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/11/2022] [Accepted: 12/13/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The rapid development of spatial transcriptomics (ST) approaches has provided new insights into understanding tissue architecture and function. However, the gene expressions measured at a spot may contain contributions from multiple cells due to the low-resolution of current ST technologies. Although many computational methods have been developed to disentangle discrete cell types from spatial mixtures, the community lacks a thorough evaluation of the performance of those deconvolution methods. RESULTS Here, we present a comprehensive benchmarking of 14 deconvolution methods on four datasets. Furthermore, we investigate the robustness of different methods to sequencing depth, spot size and the choice of normalization. Moreover, we propose a new ensemble learning-based deconvolution method (EnDecon) by integrating multiple individual methods for more accurate deconvolution. The major new findings include: (i) cell2loction, RCTD and spatialDWLS are more accurate than other ST deconvolution methods, based on the evaluation of three metrics: RMSE, PCC and JSD; (ii) cell2location and spatialDWLS are more robust to the variation of sequencing depth than RCTD; (iii) the accuracy of the existing methods tends to decrease as the spot size becomes smaller; (iv) most deconvolution methods perform best when they normalize ST data using the method described in their original papers; and (v) the integrative method, EnDecon, could achieve more accurate ST deconvolution. Our study provides valuable information and guideline for practically applying ST deconvolution tools and developing new and more effective methods. AVAILABILITY AND IMPLEMENTATION The benchmarking pipeline is available at https://github.com/SunXQlab/ST-deconvoulution. An R package for EnDecon is available at https://github.com/SunXQlab/EnDecon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lulu Yan
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | | |
Collapse
|
9
|
Yates J, Boeva V. Deciphering the etiology and role in oncogenic transformation of the CpG island methylator phenotype: a pan-cancer analysis. Brief Bioinform 2022; 23:6520307. [PMID: 35134107 PMCID: PMC8921629 DOI: 10.1093/bib/bbab610] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 12/06/2021] [Accepted: 12/30/2021] [Indexed: 12/25/2022] Open
Abstract
Numerous cancer types have shown to present hypermethylation of CpG islands, also known as a CpG island methylator phenotype (CIMP), often associated with survival variation. Despite extensive research on CIMP, the etiology of this variability remains elusive, possibly due to lack of consistency in defining CIMP. In this work, we utilize a pan-cancer approach to further explore CIMP, focusing on 26 cancer types profiled in the Cancer Genome Atlas (TCGA). We defined CIMP systematically and agnostically, discarding any effects associated with age, gender or tumor purity. We then clustered samples based on their most variable DNA methylation values and analyzed resulting patient groups. Our results confirmed the existence of CIMP in 19 cancers, including gliomas and colorectal cancer. We further showed that CIMP was associated with survival differences in eight cancer types and, in five, represented a prognostic biomarker independent of clinical factors. By analyzing genetic and transcriptomic data, we further uncovered potential drivers of CIMP and classified them in four categories: mutations in genes directly involved in DNA demethylation; mutations in histone methyltransferases; mutations in genes not involved in methylation turnover, such as KRAS and BRAF; and microsatellite instability. Among the 19 CIMP-positive cancers, very few shared potential driver events, and those drivers were only IDH1 and SETD2 mutations. Finally, we found that CIMP was strongly correlated with tumor microenvironment characteristics, such as lymphocyte infiltration. Overall, our results indicate that CIMP does not exhibit a pan-cancer manifestation; rather, general dysregulation of CpG DNA methylation is caused by heterogeneous mechanisms.
Collapse
Affiliation(s)
- Josephine Yates
- Institute for Machine Learning, Department of Computer Science, ETH Zürich, Zurich 8092, Switzerland
| | - Valentina Boeva
- Institute for Machine Learning, Department of Computer Science, ETH Zürich, Zurich 8092, Switzerland.,Swiss Institute for Bioinformatics (SIB), Zürich, Switzerland.,Cochin Institute, Inserm U1016, CNRS UMR 8104, Paris Descartes University UMR-S1016, Paris 75014, France
| |
Collapse
|