1
|
Yu L, Lin Y, Xu X, Yang P, Yang JYH. Interpretable Differential Abundance Signature (iDAS). SMALL METHODS 2025:e2500572. [PMID: 40420636 DOI: 10.1002/smtd.202500572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2025] [Revised: 04/28/2025] [Indexed: 05/28/2025]
Abstract
Single-cell technologies have revolutionized the understanding of cellular dynamics by allowing researchers to investigate individual cell responses under various conditions, such as comparing diseased versus healthy states. Many differential abundance methods have been developed in this field, however, the understanding of the gene signatures obtained from those methods is often incomplete, requiring the integration of cell type information and other biological factors to yield interpretable and meaningful results. To better interpret the gene signatures generated in the differential abundance analysis, iDAS is developed to classify the gene signatures into multiple categories. When applied to melanoma single-cell data with multiple cell states and treatment phenotypes, iDAS identified cell state- and treatment phenotype-specific gene signatures, as well as interaction effect-related gene signatures with meaningful biological interpretations. The iDAS model is further applied to a longitudinal study and spatially resolved omics data to demonstrate its versatility in different analytical contexts. These results demonstrate that the iDAS framework can effectively identify robust, cell-state specific gene signatures and is versatile enough to accommodate various study designs, including multi-factor longitudinal and spatially resolved data.
Collapse
Affiliation(s)
- Lijia Yu
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, 2006, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, Faculty of Medicine and Health, University of Sydney, Westmead, NSW, 2145, Australia
| | - Yingxin Lin
- Department of Biostatistics, Yale University, New Haven, CT, 208034, USA
| | - Xiangnan Xu
- School of Business and Economics, Humboldt-Universität zu Berlin, 10099, Berlin, Germany
| | - Pengyi Yang
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Computational Systems Biology Unit, Children's Medical Research Institute, Faculty of Medicine and Health, University of Sydney, Westmead, NSW, 2145, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| |
Collapse
|
2
|
Maden SK, Huuki-Myers LA, Kwon SH, Collado-Torres L, Maynard KR, Hicks SC. lute: estimating the cell composition of heterogeneous tissue with varying cell sizes using gene expression. BMC Genomics 2025; 26:433. [PMID: 40312738 PMCID: PMC12045009 DOI: 10.1186/s12864-025-11508-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Accepted: 03/19/2025] [Indexed: 05/03/2025] Open
Abstract
BACKGROUND Relative cell type fraction estimates in bulk RNA-sequencing data are important to control for cell composition differences across heterogenous tissue samples. While there exist algorithms to estimate the cell type proportions in tissues, a major challenge is the algorithms can show reduced performance if using tissues that have varying cell sizes, such as in brain tissue. In this way, without adjusting for differences in cell sizes, computational algorithms estimate the relative fraction of RNA attributable to each cell type, rather than the relative fraction of cell types, leading to potentially biased estimates in cellular composition. Furthermore, these tools were built on different frameworks with non-uniform input data formats while addressing different types of systematic errors or unwanted bias. RESULTS We present lute, a software tool to accurately deconvolute cell types with varying sizes. Our package lute wraps existing deconvolution algorithms in a flexible and extensible framework to enable easy benchmarking and comparison of existing deconvolution algorithms. Using simulated and real datasets, we demonstrate how lute adjusts for differences in cell sizes to improve the accuracy of cell composition. CONCLUSIONS Our software ( https://bioconductor.org/packages/lute ) can be used to enhance and improve existing deconvolution algorithms and can be used broadly for any type of tissue containing cell types with varying cell sizes.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Department of Clinical Neurosciences, School of Clinical Medicine, The University of Cambridge, Cambridge, UK
- UK Dementia Research Institute at The University of Cambridge, Cambridge, UK
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
3
|
Li Y, Xu S, Wang X, Ertekin-Taner N, Chen D. An augmented GSNMF model for complete deconvolution of bulk RNA-seq data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2025; 22:988-1018. [PMID: 40296800 PMCID: PMC12043048 DOI: 10.3934/mbe.2025036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Performing complete deconvolution analysis for bulk RNA-seq data to obtain both cell type specific gene expression profiles (GEP) and relative cell abundances is a challenging task. One of the fundamental models used, the nonnegative matrix factorization (NMF), is mathematically ill-posed. Although several complete deconvolution methods have been developed, and their estimates compared to ground truth for some datasets appear promising, a comprehensive understanding of how to circumvent the ill-posedness and improve solution accuracy is lacking. In this paper, we first investigated the necessary requirements for a given dataset to satisfy the solvability conditions in NMF theory. Even with solvability conditions, the "unique" solutions of NMF are subject to a rescaling matrix. Therefore, we provide estimates of the converged local minima and the possible rescaling matrix, based on informative initial conditions. Using these strategies, we developed a new pipeline of pseudo-bulk tissue data augmented, geometric structure guided NMF model (GSNMF+). In our approach, pseudo-bulk tissue data was generated, by statistical distribution simulated pseudo cellular compositions and single-cell RNA-seq (scRNA-seq) data, and then mixed with the original dataset. The constituent matrices of the hybrid dataset then satisfy the weak solvability conditions of NMF. Furthermore, an estimated rescaling matrix was used to adjust the minimizer of the NMF, which was expected to reduce mean square root errors of solutions. Our algorithms are tested on several realistic bulk-tissue datasets and showed significant improvements in scenarios with singular cellular compositions.
Collapse
Affiliation(s)
- Yujie Li
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
- School of Data Science, University of North Carolina at Charlotte, USA
| | - Su Xu
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
| | - Xue Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Florida, USA
| | - Nilüfer Ertekin-Taner
- Department of Neurosciences, Mayo Clinic, Florida, USA
- Department of Neurology, Mayo Clinic, Florida, USA
| | - Duan Chen
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
| |
Collapse
|
4
|
Wiklund L, Wincent E, Beronius A. Using transcriptomics data and Adverse Outcome Pathway networks to explore endocrine disrupting properties of Cadmium and PCB-126. ENVIRONMENT INTERNATIONAL 2025; 197:109352. [PMID: 40054344 DOI: 10.1016/j.envint.2025.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 02/10/2025] [Accepted: 02/24/2025] [Indexed: 03/25/2025]
Abstract
Omics-technologies such as transcriptomics offer valuable insights into toxicity mechanisms. However, integrating this type of data into regulatory frameworks remains challenging due to uncertainties regarding toxicological relevance and links to adverse outcomes. Furthermore, current assessments of endocrine disruptors (EDs) relevant to human health require substantial amounts of data, and primarily rely on standardized animal studies. Identifying EDs is a high priority in the EU, but so are efforts to replace and reduce animal testing. Alternative methods to investigate EDs are needed, and so are health risk assessment methods that support uptake of novel mechanistic information. This study aims to utilize Adverse Outcome Pathways (AOPs) to integrate transcriptomics data for identifying EDs, by establishing a link between molecular data and adverse outcomes. Cadmium (Cd) and 3,3',4,4',5-pentachlorobiphenyl (PCB126) were used as model compounds due to their observed effects on the endocrine system. An AOP network for the estrogen, androgen, thyroid, and steroidogenesis (EATS)-modalities was constructed. RNA sequencing (RNA-Seq) was conducted on zebrafish (Danio rerio) embryos exposed to Cd or PCB126 for 4 days. RNA-Seq data were then linked to the AOP network via Gene Ontology (GO) terms. Enrichment Maps in Cytoscape and the QIAGEN Ingenuity Pathway Analysis (IPA) software were also used to identify potential ED properties and to support the assessment. Potentially EATS-related GO Biological Process (BP) terms were identified for both compounds. A lack of accurate standardized terms in KEs of the AOP network hindered a data-driven mapping approach. Instead, manual mapping of GO BP terms onto the AOP network revealed more connections, underscoring the need for harmonizing AOP development for regulatory use. Both the Enrichment Maps and the IPA results further supported potentially EATS-related effects of both compounds. While AOP networks show promise in integrating RNA-Seq data, several challenges remain.
Collapse
Affiliation(s)
- Linus Wiklund
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.
| | - Emma Wincent
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anna Beronius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
5
|
Lin WY, Kartawinata M, Jebson BR, Restuadi R, Peckham H, Radziszewska A, Deakin CT, Ciurtin C, CLUSTER Consortium, Wedderburn LR, Wallace C. Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods. PLoS Comput Biol 2025; 21:e1012859. [PMID: 40053530 PMCID: PMC11957391 DOI: 10.1371/journal.pcbi.1012859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 03/31/2025] [Accepted: 02/07/2025] [Indexed: 03/09/2025] Open
Abstract
Gene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be used to estimate cell fractions from bulk expression data and infer average cell-type expression in a set of samples (e.g., cases or controls), but imputing sample-level cell-type expression is required for more detailed analyses, such as relating expression to quantitative traits, and is less commonly addressed. Here, we assessed the accuracy of imputing sample-level cell-type expression using a real dataset where mixed peripheral blood mononuclear cells (PBMC) and sorted (CD4, CD8, CD14, CD19) RNA sequencing data were generated from the same subjects (N=158), and pseudobulk datasets synthesised from eQTLgen single cell RNA-seq data. We compared three domain-specific methods, CIBERSORTx, bMIND and debCAM/swCAM, and two cross-domain machine learning methods, multiple response LASSO and ridge, that had not been used for this task before. We also assessed the methods according to their ability to recover differential gene expression (DGE) results. LASSO/ridge showed higher sensitivity but lower specificity for recovering DGE signals seen in observed data compared to deconvolution methods, although LASSO/ridge had higher area under curves than deconvolution methods. Machine learning methods have the potential to outperform domain-specific methods when suitable training data are available.
Collapse
Affiliation(s)
- Wei-Yu Lin
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Melissa Kartawinata
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
| | - Bethany R. Jebson
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
| | - Restuadi Restuadi
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
| | - Hannah Peckham
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- Division of Medicine, Department of Ageing, Rheumatology & Regenerative Medicine, UCL, London, United Kingdom
| | - Anna Radziszewska
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- Division of Medicine, Department of Ageing, Rheumatology & Regenerative Medicine, UCL, London, United Kingdom
| | - Claire T. Deakin
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- National Institute for Health Research (NIHR) GOSH Biomedical Research Centre, London, United Kingdom
| | - Coziana Ciurtin
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- Division of Medicine, Department of Ageing, Rheumatology & Regenerative Medicine, UCL, London, United Kingdom
| | | | - Lucy R. Wedderburn
- Infection, Immunity and Inflammation Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, University College London (UCL), London, United Kingdom
- Centre for Adolescent Rheumatology Versus Arthritis at University College London (UCL), University College London Hospital (UCLH) and Great Ormond Street Hospital (GOSH), London, United Kingdom
- National Institute for Health Research (NIHR) GOSH Biomedical Research Centre, London, United Kingdom
| | - Chris Wallace
- MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
6
|
Conning-Rowland M, Cheng CW, Brown O, Giannoudi M, Levelt E, Roberts LD, Griffin KJ, Cubbon RM. Application of CIBERSORTx and BayesPrism to deconvolution of bulk RNA-seq data from human myocardium and skeletal muscle. Heliyon 2025; 11:e42499. [PMID: 40034311 PMCID: PMC11872574 DOI: 10.1016/j.heliyon.2025.e42499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 02/04/2025] [Accepted: 02/05/2025] [Indexed: 03/05/2025] Open
Abstract
RNA-sequencing (RNA-seq) is an important tool to explore molecular mechanisms of disease. Technological advances mean this can be performed at the single-cell level, but the large sample sizes needed in clinical studies are currently prohibitively expensive and complex. Deconvolution of bulk RNA-seq offers an opportunity to bridge this gap by defining the cell lineage composition of samples. This approach is widely used in immunology studies, but currently there are no validated pipelines for researchers analysing human myocardium or skeletal muscle. Here, we describe the application and in silico validation of two pipelines to deconvolute human right atrium, left ventricle and skeletal muscle bulk RNA-seq data. Specifically, we have defined the major cell lineages of these tissues using single cell/nucleus RNA-seq data from the Heart Cell Atlas, which are then applied during deconvolution using the CIBERSORTx or BayesPrism deconvolution packages. Both pipelines gave robust estimates of the proportion of all major cell lineages in these tissues. We demonstrate their value in defining age- and sex-differences in tissue composition using bulk RNA-seq data from the GTEx consortium. Our validated pipelines can be rapidly applied by researchers working with existing or novel bulk RNA-seq of myocardium or skeletal muscle to gain novel insights.
Collapse
Affiliation(s)
- Marcella Conning-Rowland
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Chew W. Cheng
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Oliver Brown
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Marilena Giannoudi
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Eylem Levelt
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Lee D. Roberts
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Kathryn J. Griffin
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| | - Richard M. Cubbon
- Leeds Institute of Cardiovascular and Metabolic Medicine, The University of Leeds, Leeds, United Kingdom
| |
Collapse
|
7
|
Narayanan S, Vuckovic S, Bergman O, Wirka R, Verdezoto Mosquera J, Chen QS, Baldassarre D, Tremoli E, Veglia F, Lengquist M, Aherrahrou R, Razuvaev A, Gigante B, Björck HM, Miller CL, Quertermous T, Hedin U, Matic L. Atheroma transcriptomics identifies ARNTL as a smooth muscle cell regulator and with clinical and genetic data improves risk stratification. Eur Heart J 2025; 46:308-322. [PMID: 39552248 PMCID: PMC11735083 DOI: 10.1093/eurheartj/ehae768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 02/10/2024] [Accepted: 10/23/2024] [Indexed: 11/19/2024] Open
Abstract
BACKGROUND AND AIMS The role of vascular smooth muscle cells (SMCs) in atherosclerosis has evolved to indicate causal genetic links with the disease. Single cell RNA sequencing (scRNAseq) studies have identified multiple cell populations of mesenchymal origin within atherosclerotic lesions, including various SMC sub-phenotypes, but it is unknown how they relate to patient clinical parameters and genetics. Here, mesenchymal cell populations in atherosclerotic plaques were correlated with major coronary artery disease (CAD) genetic variants and functional analyses performed to identify SMC markers involved in the disease. METHODS Bioinformatic deconvolution was done on bulk microarrays from carotid plaques in the Biobank of Karolinska Endarterectomies (BiKE, n = 125) using public plaque scRNAseq data and associated with patient clinical data and follow-up information. BiKE patients were clustered based on the deconvoluted cell fractions. Quantitative trait loci (QTLs) analyses were performed to predict the effect of CAD associated genetic variants on mesenchymal cell fractions (cfQTLs) and gene expression (eQTLs) in plaques. RESULTS Lesions from symptomatic patients had higher fractions of Type 1 macrophages and pericytes, but lower fractions of classical and modulated SMCs compared with asymptomatic ones, particularly females. Presence of diabetes or statin treatment did not affect the cell fraction distribution. Clustering based on plaque cell fractions, revealed three patient groups, with relative differences in their stability profiles and associations to stroke, even during long-term follow-up. Several single nucleotide polymorphisms associated with plaque mesenchymal cell fractions, upstream of the circadian rhythm gene ARNTL were identified. In vitro silencing of ARNTL in human carotid SMCs increased the expression of contractile markers and attenuated cell proliferation. CONCLUSIONS This study shows the potential of combining scRNAseq data with vertically integrated clinical, genetic, and transcriptomic data from a large biobank of human plaques, for refinement of patient vulnerability and risk prediction stratification. The study revealed novel CAD-associated variants that may be functionally linked to SMCs in atherosclerotic plaques. Specifically, variants in the ARNTL gene may influence SMC ratios and function, and its role as a regulator of SMC proliferation should be further investigated.
Collapse
Affiliation(s)
- Sampath Narayanan
- Vascular Surgery, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institutet, BioClinicum J8:20, Visionsgatan 4, SE-171 76 Stockholm, Sweden
| | - Sofija Vuckovic
- Vascular Surgery, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institutet, BioClinicum J8:20, Visionsgatan 4, SE-171 76 Stockholm, Sweden
| | - Otto Bergman
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Robert Wirka
- Department of Medicine and Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Qiao Sen Chen
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Damiano Baldassarre
- Centro Cardiologico Monzino, IRCCS, Milan, Italy
- Department of Medical Biotechnology and Translational Medicine, Università di Milano, Milan, Italy
| | - Elena Tremoli
- Maria Cecilia Hospital, GVM Care & Research, Cotignola, Italy
| | - Fabrizio Veglia
- Maria Cecilia Hospital, GVM Care & Research, Cotignola, Italy
| | - Mariette Lengquist
- Vascular Surgery, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institutet, BioClinicum J8:20, Visionsgatan 4, SE-171 76 Stockholm, Sweden
| | - Redouane Aherrahrou
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
- Institute for Cardiogenetics, Universität zu Lübeck; DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck, Germany
- University Heart Centre Lübeck, Lübeck, Germany
| | - Anton Razuvaev
- Vascular Surgery, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institutet, BioClinicum J8:20, Visionsgatan 4, SE-171 76 Stockholm, Sweden
| | - Bruna Gigante
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Hanna M Björck
- Division of Cardiovascular Medicine, Center for Molecular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Karolinska University Hospital, Solna, Sweden
| | - Clint L Miller
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Thomas Quertermous
- Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
| | - Ulf Hedin
- Vascular Surgery, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institutet, BioClinicum J8:20, Visionsgatan 4, SE-171 76 Stockholm, Sweden
| | - Ljubica Matic
- Vascular Surgery, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institutet, BioClinicum J8:20, Visionsgatan 4, SE-171 76 Stockholm, Sweden
| |
Collapse
|
8
|
Lavie O, Williams LE. Using Callus as an Ex Vivo System for Chromatin Analysis. Methods Mol Biol 2025; 2873:333-347. [PMID: 39576610 DOI: 10.1007/978-1-0716-4228-3_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Next-generation sequencing has revolutionized epigenetics research, enabling a comprehensive analysis of DNA methylation and histone modification profiles to explore complex biological systems at unprecedented depth. Deciphering the intricate epigenetic mechanisms that regulate gene activity presents significant challenges, including the issue of analyzing heterogeneous cell populations in bulk. Bulk analysis introduces bias and can obscure crucial information by averaging readouts from distinct cells. Various approaches have been developed to address this issue, such as cell-type-specific enrichment or single-cell sequencing techniques. However, the need for transgenic lines with fluorescent markers, along with technical challenges such as efficient protoplast isolation and low yield, limits their widespread adoption and use in multi-omic studies. This review discusses the pros and cons of these approaches, providing a valuable basis for selecting the most suitable strategy to minimize heterogeneity. We will also highlight the use of cotyledon-derived callus as an ex vivo system as a simple, accessible, and robust platform for enabling high-throughput multi-omic analyses.
Collapse
Affiliation(s)
- Orly Lavie
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Leor Eshed Williams
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel.
| |
Collapse
|
9
|
Rumbaugh KP, Whiteley M. Towards improved biofilm models. Nat Rev Microbiol 2025; 23:57-66. [PMID: 39112554 DOI: 10.1038/s41579-024-01086-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/10/2024] [Indexed: 12/13/2024]
Abstract
Biofilms are complex microbial communities that have a critical function in many natural ecosystems, industrial settings as well as in recurrent and chronic infections. Biofilms are highly heterogeneous and dynamic assemblages that display complex responses to varying environmental factors, and those properties present substantial challenges for their study and control. In recent years, there has been a growing interest in developing improved biofilm models to offer more precise and comprehensive representations of these intricate systems. However, an objective assessment for ascertaining the ability of biofilms in model systems to recapitulate those in natural environments has been lacking. In this Perspective, we focus on medical biofilms to delve into the current state-of-the-art in biofilm modelling, emphasizing the advantages and limitations of different approaches and addressing the key challenges and opportunities for future research. We outline a framework for quantitatively assessing model accuracy. Ultimately, this Perspective aims to provide a comprehensive and critical overview of medically focused biofilm models, with the intent of inspiring future research aimed at enhancing the biological relevance of biofilm models.
Collapse
Affiliation(s)
- Kendra P Rumbaugh
- Department of Surgery, Texas Tech University Health Sciences Center and Burn Center of Research Excellence, Lubbock, TX, USA.
| | - Marvin Whiteley
- School of Biological Sciences, Georgia Institute of Technology, Emory Children's Cystic Fibrosis Center, and Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
10
|
Shahrouzi P, Azimzade Y, Brankiewicz-Kopcinska W, Bhatia S, Kunke D, Richard D, Tekpli X, Kristensen VN, Duijf PHG. Loss of chromosome cytoband 13q14.2 orchestrates breast cancer pathogenesis and drug response. Breast Cancer Res 2024; 26:170. [PMID: 39605038 PMCID: PMC11600738 DOI: 10.1186/s13058-024-01924-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 11/18/2024] [Indexed: 11/29/2024] Open
Abstract
Breast cancer (BCa) is a major global health challenge. The BCa genome often carries extensive somatic copy number alterations (CNAs), including gains/amplifications and losses/deletions. These CNAs significantly affect tumor development, drug response and patient survival. However, how individual CNAs contribute is mostly elusive. We identified loss of chromosome 13q14.2 as a key CNA in BCa, occurring in up to 63% of patients, depending on the subtype, and correlating with poor survival. Through multi-omics and in vitro analyses, we uncover a paradoxical role of 13q14.2 loss, promoting both cell cycle and pro-apoptotic pathways in cancer cells, while also associating with increased NK cell and macrophage populations in the tumor microenvironment. Notably, 13q14.2 loss increases BCa susceptibility to BCL2 inhibitors, both in vitro and in patient-derived xenografts. Thus, 13q14.2 loss could serve as a biomarker for BCa prognosis and treatment, potentially improving outcomes for BCa patients.
Collapse
Affiliation(s)
- Parastoo Shahrouzi
- Department of Medical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway.
| | - Youness Azimzade
- Oslo Center for Biostatistics and Epidemiology, University of Oslo, Oslo, Norway
| | - Wioletta Brankiewicz-Kopcinska
- Department of Medical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Sugandha Bhatia
- School of Biomedical Sciences, Centre for Genomics and Personalised Health at the Translational Research Institute, Queensland University of Technology (QUT), Woolloongabba, QLD, 4102, Australia
| | - David Kunke
- Department of Medical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Derek Richard
- Faculty of Health, School of Biomedical Sciences, Queensland University of Technology (QUT), Woolloongabba,, QLD, 4102, Australia
| | - Xavier Tekpli
- Department of Medical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Vessela N Kristensen
- Department of Medical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway.
| | - Pascal H G Duijf
- Department of Medical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway.
- Centre for Cancer Biology, Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia.
| |
Collapse
|
11
|
Xu X, Li R, Mo O, Liu K, Li J, Hao P. Cell-type deconvolution for bulk RNA-seq data using single-cell reference: a comparative analysis and recommendation guideline. Brief Bioinform 2024; 26:bbaf031. [PMID: 39899596 PMCID: PMC11789683 DOI: 10.1093/bib/bbaf031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 12/06/2024] [Accepted: 01/20/2025] [Indexed: 02/05/2025] Open
Abstract
The accurate estimation of cell type proportions in tissues is crucial for various downstream analyses. With the increasing availability of single-cell sequencing data, numerous deconvolution methods that use single-cell RNA sequencing data as a reference have been developed. However, a unified understanding of how these deconvolution approaches perform in practical applications is still lacking. To address this, we systematically assessed the accuracy and robustness of nine deconvolution methods that use single-cell RNA sequencing data as a reference, evaluating them on real bulk data with cell proportions verified through flow cytometry, as well as simulated bulk data generated from five single-cell RNA sequencing datasets. Our study highlights the importance of several factors-including reference dataset construction strategies, dataset size, cell type subdivision, and cell type inconsistency-on the accuracy and robustness of deconvolution results. We also propose a set of recommended guidelines for software users in diverse scenarios.
Collapse
Affiliation(s)
- Xintian Xu
- Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, 320 Yueyang Road, Xuhui District, Shanghai 200031, China
- University of Chinese Academy of Sciences, 1 Yanqihu East Road, Huairou District, Beijing 100039, China
| | - Rui Li
- Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, 320 Yueyang Road, Xuhui District, Shanghai 200031, China
- University of Chinese Academy of Sciences, 1 Yanqihu East Road, Huairou District, Beijing 100039, China
| | - Ouyang Mo
- Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, 320 Yueyang Road, Xuhui District, Shanghai 200031, China
- University of Chinese Academy of Sciences, 1 Yanqihu East Road, Huairou District, Beijing 100039, China
| | - Kai Liu
- Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, 320 Yueyang Road, Xuhui District, Shanghai 200031, China
- Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, 270 Dong'an Road, Xuhui District, Shanghai 200032, China
| | - Justin Li
- Department of Mathematics, University of Connecticut, 352 Mansfield Road, Storrs, CT 06269, USA
| | - Pei Hao
- Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, 320 Yueyang Road, Xuhui District, Shanghai 200031, China
- University of Chinese Academy of Sciences, 1 Yanqihu East Road, Huairou District, Beijing 100039, China
| |
Collapse
|
12
|
Luo S, Zhu M, Lin L, Xie J, Lin S, Chen Y, Zhu J, Huang J. DECA: harnessing interpretable transformer model for cellular deconvolution of chromatin accessibility profile. Brief Bioinform 2024; 26:bbaf069. [PMID: 39987573 PMCID: PMC11847511 DOI: 10.1093/bib/bbaf069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 01/09/2025] [Accepted: 02/06/2025] [Indexed: 02/25/2025] Open
Abstract
The assay for transposase-accessible chromatin with sequencing (ATAC-seq) identifies chromatin accessibility across the genome, crucial for gene expression regulating. However, bulk ATAC-seq obscures cellular heterogeneity, while single-cell ATAC-seq suffers from issues such as sparsity and costliness. To this end, we introduce DECA, a sophisticated deep learning model based on vision transformer to deconvolve cell type information from bulk chromatin accessibility profiles, utilizing single-cell ATAC-seq datasets as reference for enhanced precision and resolution. Notably, patch attention generated by DECA's multi-head attention mechanism aligns with chromatin interactions detected by Hi-C. Additionally, DECA predicted lineage-specific cell composition changes due to genetic perturbation. The chromatin accessibility signatures predicted by DECA are enriched with cell-type specific genetic variations. Ultimately, we applied DECA on pan-cancer ATAC-seq datasets and demonstrated its capability to deconvolve cell type proportions with clinical significance. Taken together, DECA deconvolves cellular proportions and predicts their chromatin accessibility profiles from bulk chromatin accessibility data, which enable exploring the gene regulatory programs in development and diseases.
Collapse
Affiliation(s)
- Shijie Luo
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| | - Ming Zhu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| | - Liquan Lin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| | - Jiajing Xie
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| | - Shihao Lin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| | - Ying Chen
- School of Informatics, Xiamen University, No. 4221, Xiang'an South Road, Fujian 361000, China
| | - Jiali Zhu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| | - Jialiang Huang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
- National Institute for Data Science in Health and Medicine, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China
| |
Collapse
|
13
|
Aubin RG, Montelongo J, Hu R, Gunther E, Nicodemus P, Camara PG. Clustering-independent estimation of cell abundances in bulk tissues using single-cell RNA-seq data. CELL REPORTS METHODS 2024; 4:100905. [PMID: 39561717 PMCID: PMC11705773 DOI: 10.1016/j.crmeth.2024.100905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/03/2024] [Accepted: 10/22/2024] [Indexed: 11/21/2024]
Abstract
Single-cell RNA sequencing has transformed the study of biological tissues by enabling transcriptomic characterizations of their constituent cell states. Computational methods for gene expression deconvolution use this information to infer the cell composition of related tissues profiled at the bulk level. However, current deconvolution methods are restricted to discrete cell types and have limited power to make inferences about continuous cellular processes such as cell differentiation or immune cell activation. We present ConDecon, a clustering-independent method for inferring the likelihood for each cell in a single-cell dataset to be present in a bulk tissue. ConDecon represents an improvement in phenotypic resolution and functionality with respect to regression-based methods. Using ConDecon, we discover the implication of neurodegenerative microglia inflammatory pathways in the mesenchymal transformation of pediatric ependymoma and characterize their spatial trajectories of activation. The generality of this approach enables the deconvolution of other data modalities, such as bulk ATAC-seq data.
Collapse
Affiliation(s)
- Rachael G Aubin
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Javier Montelongo
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Robert Hu
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Elijah Gunther
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Patrick Nicodemus
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Pablo G Camara
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA.
| |
Collapse
|
14
|
Gabriel AAG, Racle J, Falquet M, Jandus C, Gfeller D. Robust estimation of cancer and immune cell-type proportions from bulk tumor ATAC-Seq data. eLife 2024; 13:RP94833. [PMID: 39383060 PMCID: PMC11464006 DOI: 10.7554/elife.94833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2024] Open
Abstract
Assay for Transposase-Accessible Chromatin sequencing (ATAC-Seq) is a widely used technique to explore gene regulatory mechanisms. For most ATAC-Seq data from healthy and diseased tissues such as tumors, chromatin accessibility measurement represents a mixed signal from multiple cell types. In this work, we derive reliable chromatin accessibility marker peaks and reference profiles for most non-malignant cell types frequently observed in the microenvironment of human tumors. We then integrate these data into the EPIC deconvolution framework (Racle et al., 2017) to quantify cell-type heterogeneity in bulk ATAC-Seq data. Our EPIC-ATAC tool accurately predicts non-malignant and malignant cell fractions in tumor samples. When applied to a human breast cancer cohort, EPIC-ATAC accurately infers the immune contexture of the main breast cancer subtypes.
Collapse
Affiliation(s)
- Aurélie Anne-Gaëlle Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research, University of LausanneLausanneSwitzerland
- Agora Cancer Research CenterLausanneSwitzerland
- Swiss Cancer Center Leman (SCCL)GenevaSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| | - Julien Racle
- Department of Oncology, Ludwig Institute for Cancer Research, University of LausanneLausanneSwitzerland
- Agora Cancer Research CenterLausanneSwitzerland
- Swiss Cancer Center Leman (SCCL)GenevaSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| | - Maryline Falquet
- Swiss Cancer Center Leman (SCCL)GenevaSwitzerland
- Ludwig Institute for Cancer Research, Lausanne BranchLausanneSwitzerland
- Department of Pathology and Immunology, Faculty of Medicine, University of GenevaGenevaSwitzerland
- Geneva Center for Inflammation ResearchGenevaSwitzerland
| | - Camilla Jandus
- Swiss Cancer Center Leman (SCCL)GenevaSwitzerland
- Ludwig Institute for Cancer Research, Lausanne BranchLausanneSwitzerland
- Department of Pathology and Immunology, Faculty of Medicine, University of GenevaGenevaSwitzerland
- Geneva Center for Inflammation ResearchGenevaSwitzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of LausanneLausanneSwitzerland
- Agora Cancer Research CenterLausanneSwitzerland
- Swiss Cancer Center Leman (SCCL)GenevaSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| |
Collapse
|
15
|
Aubin RG, Montelongo J, Hu R, Gunther E, Nicodemus P, Camara PG. Clustering-independent estimation of cell abundances in bulk tissues using single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.06.527318. [PMID: 36798206 PMCID: PMC9934539 DOI: 10.1101/2023.02.06.527318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Single-cell RNA-sequencing has transformed the study of biological tissues by enabling transcriptomic characterizations of their constituent cell states. Computational methods for gene expression deconvolution use this information to infer the cell composition of related tissues profiled at the bulk level. However, current deconvolution methods are restricted to discrete cell types and have limited power to make inferences about continuous cellular processes like cell differentiation or immune cell activation. We present ConDecon, a clustering-independent method for inferring the likelihood for each cell in a single-cell dataset to be present in a bulk tissue. ConDecon represents an improvement in phenotypic resolution and functionality with respect to regression-based methods. Using ConDecon, we discover the implication of neurodegenerative microglia inflammatory pathways in the mesenchymal transformation of pediatric ependymoma and characterize their spatial trajectories of activation. The generality of this approach enables the deconvolution of other data modalities such as bulk ATAC-seq data.
Collapse
Affiliation(s)
- Rachael G Aubin
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104
| | - Javier Montelongo
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104
| | - Robert Hu
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104
| | - Elijah Gunther
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104
| | - Patrick Nicodemus
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104
| | - Pablo G Camara
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104
| |
Collapse
|
16
|
Wang C, Lin Y, Li S, Guan J. Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data. BMC Genomics 2024; 25:875. [PMID: 39294558 PMCID: PMC11409548 DOI: 10.1186/s12864-024-10728-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 08/20/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND The widely adopted bulk RNA-seq measures the gene expression average of cells, masking cell type heterogeneity, which confounds downstream analyses. Therefore, identifying the cellular composition and cell type-specific gene expression profiles (GEPs) facilitates the study of the underlying mechanisms of various biological processes. Although single-cell RNA-seq focuses on cell type heterogeneity in gene expression, it requires specialized and expensive resources and currently is not practical for a large number of samples or a routine clinical setting. Recently, computational deconvolution methodologies have been developed, while many of them only estimate cell type composition or cell type-specific GEPs by requiring the other as input. The development of more accurate deconvolution methods to infer cell type abundance and cell type-specific GEPs is still essential. RESULTS We propose a new deconvolution algorithm, DSSC, which infers cell type-specific gene expression and cell type proportions of heterogeneous samples simultaneously by leveraging gene-gene and sample-sample similarities in bulk expression and single-cell RNA-seq data. Through comparisons with the other existing methods, we demonstrate that DSSC is effective in inferring both cell type proportions and cell type-specific GEPs across simulated pseudo-bulk data (including intra-dataset and inter-dataset simulations) and experimental bulk data (including mixture data and real experimental data). DSSC shows robustness to the change of marker gene number and sample size and also has cost and time efficiencies. CONCLUSIONS DSSC provides a practical and promising alternative to the experimental techniques to characterize cellular composition and heterogeneity in the gene expression of heterogeneous samples.
Collapse
Affiliation(s)
- Chenqi Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Yifan Lin
- Department of Automation, Xiamen University, Xiamen, China
| | - Shuchao Li
- Department of Automation, Xiamen University, Xiamen, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, China.
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.
| |
Collapse
|
17
|
Li M, Su Y, Gao Y, Tian W. ReCIDE: robust estimation of cell type proportions by integrating single-reference-based deconvolutions. Brief Bioinform 2024; 25:bbae422. [PMID: 39177263 PMCID: PMC11342246 DOI: 10.1093/bib/bbae422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 07/16/2024] [Accepted: 08/12/2024] [Indexed: 08/24/2024] Open
Abstract
In this study, we introduce Robust estimation of Cell type proportions by Integrating single-reference-based DEconvolutions (ReCIDE), an innovative framework for robust estimation of cell type proportions by integrating single-reference-based deconvolutions. ReCIDE outperforms existing approaches in benchmark and real datasets, particularly excelling in estimating rare cell type proportions. Through exploratory analysis on public bulk data of triple-negative breast cancer (TNBC) patients using ReCIDE, we demonstrate a significant correlation between the prognosis of TNBC patients and the proportions of both T cell and perivascular-like cell subtypes. Built upon this discovery, we develop a prognostic assessment model for TNBC patients. Our contribution presents a novel framework for enhancing deconvolution accuracy, showcasing its effectiveness in medical research.
Collapse
Affiliation(s)
- Minghan Li
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Yuqing Su
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Yanbo Gao
- Shanghai SPH Jiaolian Pharmaceutical Technology Company, Limited, Buliding 4, 998 Ha Lei Road, Pudong District, Shanghai 201203, China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, 2005 Songhu Road, Yangpu District, Shanghai 200438, China
- Children’s Hospital of Fudan University, 399 Wanyuan Road, Minhang District, Shanghai 201102, China
- Children’s Hospital of Shandong University, 23976 Jingshi Road, Huaiyin District, Jinan, Shandong 250022, China
| |
Collapse
|
18
|
Hu M, Chikina M. Heterogeneous pseudobulk simulation enables realistic benchmarking of cell-type deconvolution methods. Genome Biol 2024; 25:169. [PMID: 38956606 PMCID: PMC11218230 DOI: 10.1186/s13059-024-03292-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/29/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND Computational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. RESULTS In our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. CONCLUSIONS Our heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly package https://github.com/humengying0907/deconvBenchmarking and https://doi.org/10.5281/zenodo.8206516 , enabling further developments in deconvolution methods.
Collapse
Affiliation(s)
- Mengying Hu
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, USA
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, USA.
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, USA.
| |
Collapse
|
19
|
Tiong KL, Luzhbin D, Yeang CH. Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data. BMC Bioinformatics 2024; 25:209. [PMID: 38867193 PMCID: PMC11167951 DOI: 10.1186/s12859-024-05825-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 06/03/2024] [Indexed: 06/14/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (sc-RNASeq) data illuminate transcriptomic heterogeneity but also possess a high level of noise, abundant missing entries and sometimes inadequate or no cell type annotations at all. Bulk-level gene expression data lack direct information of cell population composition but are more robust and complete and often better annotated. We propose a modeling framework to integrate bulk-level and single-cell RNASeq data to address the deficiencies and leverage the mutual strengths of each type of data and enable a more comprehensive inference of their transcriptomic heterogeneity. Contrary to the standard approaches of factorizing the bulk-level data with one algorithm and (for some methods) treating single-cell RNASeq data as references to decompose bulk-level data, we employed multiple deconvolution algorithms to factorize the bulk-level data, constructed the probabilistic graphical models of cell-level gene expressions from the decomposition outcomes, and compared the log-likelihood scores of these models in single-cell data. We term this framework backward deconvolution as inference operates from coarse-grained bulk-level data to fine-grained single-cell data. As the abundant missing entries in sc-RNASeq data have a significant effect on log-likelihood scores, we also developed a criterion for inclusion or exclusion of zero entries in log-likelihood score computation. RESULTS We selected nine deconvolution algorithms and validated backward deconvolution in five datasets. In the in-silico mixtures of mouse sc-RNASeq data, the log-likelihood scores of the deconvolution algorithms were strongly anticorrelated with their errors of mixture coefficients and cell type specific gene expression signatures. In the true bulk-level mouse data, the sample mixture coefficients were unknown but the log-likelihood scores were strongly correlated with accuracy rates of inferred cell types. In the data of autism spectrum disorder (ASD) and normal controls, we found that ASD brains possessed higher fractions of astrocytes and lower fractions of NRGN-expressing neurons than normal controls. In datasets of breast cancer and low-grade gliomas (LGG), we compared the log-likelihood scores of three simple hypotheses about the gene expression patterns of the cell types underlying the tumor subtypes. The model that tumors of each subtype were dominated by one cell type persistently outperformed an alternative model that each cell type had elevated expression in one gene group and tumors were mixtures of those cell types. Superiority of the former model is also supported by comparing the real breast cancer sc-RNASeq clusters with those generated by simulated sc-RNASeq data. CONCLUSIONS The results indicate that backward deconvolution serves as a sensible model selection tool for deconvolution algorithms and facilitates discerning hypotheses about cell type compositions underlying heterogeneous specimens such as tumors.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Dmytro Luzhbin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
20
|
Hoedjes KM, Grath S, Posnien N, Ritchie MG, Schlötterer C, Abbott JK, Almudi I, Coronado-Zamora M, Durmaz Mitchell E, Flatt T, Fricke C, Glaser-Schmitt A, González J, Holman L, Kankare M, Lenhart B, Orengo DJ, Snook RR, Yılmaz VM, Yusuf L. From whole bodies to single cells: A guide to transcriptomic approaches for ecology and evolutionary biology. Mol Ecol 2024:e17382. [PMID: 38856653 DOI: 10.1111/mec.17382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 04/09/2024] [Accepted: 04/29/2024] [Indexed: 06/11/2024]
Abstract
RNA sequencing (RNAseq) methodology has experienced a burst of technological developments in the last decade, which has opened up opportunities for studying the mechanisms of adaptation to environmental factors at both the organismal and cellular level. Selecting the most suitable experimental approach for specific research questions and model systems can, however, be a challenge and researchers in ecology and evolution are commonly faced with the choice of whether to study gene expression variation in whole bodies, specific tissues, and/or single cells. A wide range of sometimes polarised opinions exists over which approach is best. Here, we highlight the advantages and disadvantages of each of these approaches to provide a guide to help researchers make informed decisions and maximise the power of their study. Using illustrative examples of various ecological and evolutionary research questions, we guide the readers through the different RNAseq approaches and help them identify the most suitable design for their own projects.
Collapse
Affiliation(s)
- Katja M Hoedjes
- Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sonja Grath
- Division of Evolutionary Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Nico Posnien
- Department of Developmental Biology, Göttingen Center for Molecular Biosciences (GZMB), University of Göttingen, Göttingen, Germany
| | - Michael G Ritchie
- Centre for Biological Diversity, University of St Andrews, St Andrews, UK
| | | | | | - Isabel Almudi
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | | | - Esra Durmaz Mitchell
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Functional Genomics and Metabolism Research Unit, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Thomas Flatt
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Claudia Fricke
- Institute for Zoology/Animal Ecology, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | | | - Josefa González
- Institute of Evolutionary Biology, CSIC, UPF, Barcelona, Spain
| | - Luke Holman
- School of Applied Sciences, Edinburgh Napier University, Edinburgh, UK
| | - Maaria Kankare
- Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
| | - Benedict Lenhart
- Department of Biology, University of Virginia, Charlottesville, Virginia, USA
| | - Dorcas J Orengo
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Rhonda R Snook
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Vera M Yılmaz
- Division of Evolutionary Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Leeban Yusuf
- Centre for Biological Diversity, University of St Andrews, St Andrews, UK
| |
Collapse
|
21
|
Huang J, Du Y, Stucky A, Kelly KR, Zhong JF, Sun F. DeepDecon accurately estimates cancer cell fractions in bulk RNA-seq data. PATTERNS (NEW YORK, N.Y.) 2024; 5:100969. [PMID: 38800361 PMCID: PMC11117059 DOI: 10.1016/j.patter.2024.100969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/15/2024] [Accepted: 03/21/2024] [Indexed: 05/29/2024]
Abstract
Understanding the cellular composition of a disease-related tissue is important in disease diagnosis, prognosis, and downstream treatment. Recent advances in single-cell RNA-sequencing (scRNA-seq) technique have allowed the measurement of gene expression profiles for individual cells. However, scRNA-seq is still too expensive to be used for large-scale population studies, and bulk RNA-seq is still widely used in such situations. An essential challenge is to deconvolve cellular composition for bulk RNA-seq data based on scRNA-seq data. Here, we present DeepDecon, a deep neural network model that leverages single-cell gene expression information to accurately predict the fraction of cancer cells in bulk tissues. It provides a refining strategy in which the cancer cell fraction is iteratively estimated by a set of trained models. When applied to simulated and real cancer data, DeepDecon exhibits superior performance compared to existing decomposition methods in terms of accuracy.
Collapse
Affiliation(s)
- Jiawei Huang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Yuxuan Du
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Andres Stucky
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| | - Kevin R. Kelly
- Division of Hematology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jiang F. Zhong
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
22
|
O'Connell GC. Dataset including whole blood gene expression profiles and matched leukocyte counts with utility for benchmarking cellular deconvolution pipelines. BMC Genom Data 2024; 25:45. [PMID: 38714942 PMCID: PMC11077736 DOI: 10.1186/s12863-024-01223-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
OBJECTIVES Cellular deconvolution is a valuable computational process that can infer the cellular composition of heterogeneous tissue samples from bulk RNA-sequencing data. Benchmark testing is a crucial step in the development and evaluation of new cellular deconvolution algorithms, and also plays a key role in the process of building and optimizing deconvolution pipelines for specific experimental applications. However, few in vivo benchmarking datasets exist, particularly for whole blood, which is the single most profiled human tissue. Here, we describe a unique dataset containing whole blood gene expression profiles and matched circulating leukocyte counts from a large cohort of human donors with utility for benchmarking cellular deconvolution pipelines. DATA DESCRIPTION To produce this dataset, venous whole blood was sampled from 138 total donors recruited at an academic medical center. Genome-wide expression profiling was subsequently performed via next-generation RNA sequencing, and white blood cell differentials were collected in parallel using flow cytometry. The resultant final dataset contains donor-level expression data for over 45,000 protein coding and non-protein coding genes, as well as matched neutrophil, lymphocyte, monocyte, and eosinophil counts.
Collapse
Affiliation(s)
- Grant C O'Connell
- Molecular Biomarker Core, Case Western Reserve University, Cleveland, OH, USA.
- School of Nursing, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH, 44106-4904, USA.
| |
Collapse
|
23
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. Genome Med 2024; 16:65. [PMID: 38685057 PMCID: PMC11057104 DOI: 10.1186/s13073-024-01338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
Using computational tools, bulk transcriptomics can be deconvoluted to estimate the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, ignoring person-to-person heterogeneity. Here, we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. Simulation studies demonstrate reduced bias compared with existing methods. Real data analyses on longitudinal consortia show disparities in cell type proportions are associated with several disease phenotypes in Type 1 diabetes and Parkinson's disease. imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/ .
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA.
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA.
| |
Collapse
|
24
|
Hsu YC, Chiu YC, Lu TP, Hsiao TH, Chen Y. Predicting drug response through tumor deconvolution by cancer cell lines. PATTERNS (NEW YORK, N.Y.) 2024; 5:100949. [PMID: 38645769 PMCID: PMC11026976 DOI: 10.1016/j.patter.2024.100949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 02/12/2024] [Indexed: 04/23/2024]
Abstract
Large-scale cancer drug sensitivity data have become available for a collection of cancer cell lines, but only limited drug response data from patients are available. Bridging the gap in pharmacogenomics knowledge between in vitro and in vivo datasets remains challenging. In this study, we trained a deep learning model, Scaden-CA, for deconvoluting tumor data into proportions of cancer-type-specific cell lines. Then, we developed a drug response prediction method using the deconvoluted proportions and the drug sensitivity data from cell lines. The Scaden-CA model showed excellent performance in terms of concordance correlation coefficients (>0.9 for model testing) and the correctly deconvoluted rate (>70% across most cancers) for model validation using Cancer Cell Line Encyclopedia (CCLE) bulk RNA data. We applied the model to tumors in The Cancer Genome Atlas (TCGA) dataset and examined associations between predicted cell viability and mutation status or gene expression levels to understand underlying mechanisms of potential value for drug repurposing.
Collapse
Affiliation(s)
- Yu-Ching Hsu
- Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 115, Taiwan
- Bioinformatics Program, Institute of Statistical Science, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan
- Institute of Health Data Analytics and Statistics, Department of Public Health, College of Public Health, National Taiwan University, Taipei 100, Taiwan
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yu-Chiao Chiu
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA 15232, USA
| | - Tzu-Pin Lu
- Institute of Health Data Analytics and Statistics, Department of Public Health, College of Public Health, National Taiwan University, Taipei 100, Taiwan
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung 40705, Taiwan
| | - Yidong Chen
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
25
|
Ferro dos Santos MR, Giuili E, De Koker A, Everaert C, De Preter K. Computational deconvolution of DNA methylation data from mixed DNA samples. Brief Bioinform 2024; 25:bbae234. [PMID: 38762790 PMCID: PMC11102637 DOI: 10.1093/bib/bbae234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/30/2024] [Accepted: 04/30/2024] [Indexed: 05/20/2024] Open
Abstract
In this review, we provide a comprehensive overview of the different computational tools that have been published for the deconvolution of bulk DNA methylation (DNAm) data. Here, deconvolution refers to the estimation of cell-type proportions that constitute a mixed sample. The paper reviews and compares 25 deconvolution methods (supervised, unsupervised or hybrid) developed between 2012 and 2023 and compares the strengths and limitations of each approach. Moreover, in this study, we describe the impact of the platform used for the generation of methylation data (including microarrays and sequencing), the applied data pre-processing steps and the used reference dataset on the deconvolution performance. Next to reference-based methods, we also examine methods that require only partial reference datasets or require no reference set at all. In this review, we provide guidelines for the use of specific methods dependent on the DNA methylation data type and data availability.
Collapse
Affiliation(s)
- Maísa R Ferro dos Santos
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Edoardo Giuili
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Andries De Koker
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Celine Everaert
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Katleen De Preter
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| |
Collapse
|
26
|
Wu CT, Du D, Chen L, Dai R, Liu C, Yu G, Bhardwaj S, Parker SJ, Zhang Z, Clarke R, Herrington DM, Wang Y. CAM3.0: determining cell type composition and expression from bulk tissues with fully unsupervised deconvolution. Bioinformatics 2024; 40:btae107. [PMID: 38407991 PMCID: PMC10924278 DOI: 10.1093/bioinformatics/btae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 01/13/2024] [Accepted: 02/25/2024] [Indexed: 02/28/2024] Open
Abstract
MOTIVATION Complex tissues are dynamic ecosystems consisting of molecularly distinct yet interacting cell types. Computational deconvolution aims to dissect bulk tissue data into cell type compositions and cell-specific expressions. With few exceptions, most existing deconvolution tools exploit supervised approaches requiring various types of references that may be unreliable or even unavailable for specific tissue microenvironments. RESULTS We previously developed a fully unsupervised deconvolution method-Convex Analysis of Mixtures (CAM), that enables estimation of cell type composition and expression from bulk tissues. We now introduce CAM3.0 tool that improves this framework with three new and highly efficient algorithms, namely, radius-fixed clustering to identify reliable markers, linear programming to detect an initial scatter simplex, and a smart floating search for the optimum latent variable model. The comparative experimental results obtained from both realistic simulations and case studies show that the CAM3.0 tool can help biologists more accurately identify known or novel cell markers, determine cell proportions, and estimate cell-specific expressions, complementing the existing tools particularly when study- or datatype-specific references are unreliable or unavailable. AVAILABILITY AND IMPLEMENTATION The open-source R Scripts of CAM3.0 is freely available at https://github.com/ChiungTingWu/CAM3/(https://github.com/Bioconductor/Contributions/issues/3205). A user's guide and a vignette are provided.
Collapse
Affiliation(s)
- Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Dongping Du
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, United States
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, United States
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing 100084, P. R. China
| | - Saurabh Bhardwaj
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering & Technology, Punjab 147004, India
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, United States
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, United States
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, United States
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| |
Collapse
|
27
|
Zhang H, Lu X, Lu B, Gullo G, Chen L. Measuring the composition of the tumor microenvironment with transcriptome analysis: past, present and future. Future Oncol 2024; 20:1207-1220. [PMID: 38362731 PMCID: PMC11318690 DOI: 10.2217/fon-2023-0658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/24/2024] [Indexed: 02/17/2024] Open
Abstract
Interactions between tumor cells and immune cells in the tumor microenvironment (TME) play a vital role the mechanisms of immune evasion, by which cancer cells escape immune elimination. Thus, the characterization and quantification of different components in the TME is a hot topic in molecular biology and drug discovery. Since the development of transcriptome sequencing in bulk tissue, single cells and spatial dimensions, there are increasing methods emerging to deconvolute and subtype the TME. This review discusses and compares such computational strategies and downstream subtyping analyses. Integrative analyses of the transcriptome with other data, such as epigenetics and T-cell receptor sequencing, are needed to obtain comprehensive knowledge of the dynamic TME.
Collapse
Affiliation(s)
- Han Zhang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
- UPMC Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Binfeng Lu
- Center for Discovery & Innovation, Hackensack Meridian Health, Nutley, NJ 07110, USA
| | - Giuseppe Gullo
- Department of Obstetrics & Gynecology, Villa Sofia Cervello Hospital, University of Palermo, 90146, Palermo, Italy
| | - Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| |
Collapse
|
28
|
Mañanes D, Rivero-García I, Relaño C, Torres M, Sancho D, Jimenez-Carretero D, Torroja C, Sánchez-Cabo F. SpatialDDLS: an R package to deconvolute spatial transcriptomics data using neural networks. Bioinformatics 2024; 40:btae072. [PMID: 38366652 PMCID: PMC10881086 DOI: 10.1093/bioinformatics/btae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 01/10/2024] [Accepted: 02/13/2024] [Indexed: 02/18/2024] Open
Abstract
SUMMARY Spatial transcriptomics has changed our way to study tissue structure and cellular organization. However, there are still limitations in its resolution, and most available platforms do not reach a single cell resolution. To address this issue, we introduce SpatialDDLS, a fast neural network-based algorithm for cell type deconvolution of spatial transcriptomics data. SpatialDDLS leverages single-cell RNA sequencing data to simulate mixed transcriptional profiles with predefined cellular composition, which are subsequently used to train a fully connected neural network to uncover cell type diversity within each spot. By comparing it with two state-of-the-art spatial deconvolution methods, we demonstrate that SpatialDDLS is an accurate and fast alternative to the available state-of-the art tools. AVAILABILITY AND IMPLEMENTATION The R package SpatialDDLS is available via CRAN-The Comprehensive R Archive Network: https://CRAN.R-project.org/package=SpatialDDLS. A detailed manual of the main functionalities implemented in the package can be found at https://diegommcc.github.io/SpatialDDLS.
Collapse
Affiliation(s)
- Diego Mañanes
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Inés Rivero-García
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- Departamento de Ingeniería Biomédica, ETSI de Telecomunicaciones, Universidad Politécnica de Madrid, 28040 Madrid, Spain
| | - Carlos Relaño
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Miguel Torres
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - David Sancho
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | | | - Carlos Torroja
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Fátima Sánchez-Cabo
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| |
Collapse
|
29
|
Guo X, Huang Z, Ju F, Zhao C, Yu L. Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2306329. [PMID: 38072669 PMCID: PMC10870031 DOI: 10.1002/advs.202306329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/27/2023] [Indexed: 02/17/2024]
Abstract
Accurately identifies the cellular composition of complex tissues, which is critical for understanding disease pathogenesis, early diagnosis, and prevention. However, current methods for deconvoluting bulk RNA sequencing (RNA-seq) typically rely on matched single-cell RNA sequencing (scRNA-seq) as a reference, which can be limiting due to differences in sequencing distribution and the potential for invalid information from single-cell references. Hence, a novel computational method named SCROAM is introduced to address these challenges. SCROAM transforms scRNA-seq and bulk RNA-seq into a shared feature space, effectively eliminating distributional differences in the latent space. Subsequently, cell-type-specific expression matrices are generated from the scRNA-seq data, facilitating the precise identification of cell types within bulk tissues. The performance of SCROAM is assessed through benchmarking against simulated and real datasets, demonstrating its accuracy and robustness. To further validate SCROAM's performance, single-cell and bulk RNA-seq experiments are conducted on mouse spinal cord tissue, with SCROAM applied to identify cell types in bulk tissue. Results indicate that SCROAM is a highly effective tool for identifying similar cell types. An integrated analysis of liver cancer and primary glioblastoma is then performed. Overall, this research offers a novel perspective for delivering precise insights into disease pathogenesis and potential therapeutic strategies.
Collapse
Affiliation(s)
- Xinyang Guo
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Zhaoyang Huang
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Fen Ju
- Department of Rehabilitation MedicineXijing HospitalFourth Military Medical UniversityXi'an710032China
| | - Chenguang Zhao
- Department of Rehabilitation MedicineXijing HospitalFourth Military Medical UniversityXi'an710032China
| | - Liang Yu
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| |
Collapse
|
30
|
Wang L, Hu Y, Gao L. Adjustment of scRNA-seq data to improve cell-type decomposition of spatial transcriptomics. Brief Bioinform 2024; 25:bbae063. [PMID: 38426323 PMCID: PMC10939420 DOI: 10.1093/bib/bbae063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/10/2024] [Accepted: 02/01/2024] [Indexed: 03/02/2024] Open
Abstract
Most sequencing-based spatial transcriptomics (ST) technologies do not achieve single-cell resolution where each captured location (spot) may contain a mixture of cells from heterogeneous cell types, and several cell-type decomposition methods have been proposed to estimate cell type proportions of each spot by integrating with single-cell RNA sequencing (scRNA-seq) data. However, these existing methods did not fully consider the effect of distribution difference between scRNA-seq and ST data for decomposition, leading to biased cell-type-specific genes derived from scRNA-seq for ST data. To address this issue, we develop an instance-based transfer learning framework to adjust scRNA-seq data by ST data to correctly match cell-type-specific gene expression. We evaluate the effect of raw and adjusted scRNA-seq data on cell-type decomposition by eight leading decomposition methods using both simulated and real datasets. Experimental results show that data adjustment can effectively reduce distribution difference and improve decomposition, thus enabling for a more precise depiction on spatial organization of cell types. We highlight the importance of data adjustment in integrative analysis of scRNA-seq with ST data and provide guidance for improved cell-type decomposition.
Collapse
Affiliation(s)
- Lanying Wang
- School of Computer Science and Technology, Xidian University, Xi’an 710100, China
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an 710100, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710100, China
| |
Collapse
|
31
|
Ricker CA, Meli K, Van Allen EM. Historical perspective and future directions: computational science in immuno-oncology. J Immunother Cancer 2024; 12:e008306. [PMID: 38191244 PMCID: PMC10826578 DOI: 10.1136/jitc-2023-008306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2023] [Indexed: 01/10/2024] Open
Abstract
Immuno-oncology holds promise for transforming patient care having achieved durable clinical response rates across a variety of advanced and metastatic cancers. Despite these achievements, only a minority of patients respond to immunotherapy, underscoring the importance of elucidating molecular mechanisms responsible for response and resistance to inform the development and selection of treatments. Breakthroughs in molecular sequencing technologies have led to the generation of an immense amount of genomic and transcriptomic sequencing data that can be mined to uncover complex tumor-immune interactions using computational tools. In this review, we discuss existing and emerging computational methods that contextualize the composition and functional state of the tumor microenvironment, infer the reactivity and clonal dynamics from reconstructed immune cell receptor repertoires, and predict the antigenic landscape for immune cell recognition. We further describe the advantage of multi-omics analyses for capturing multidimensional relationships and artificial intelligence techniques for integrating omics data with histopathological and radiological images to encapsulate patterns of treatment response and tumor-immune biology. Finally, we discuss key challenges impeding their widespread use and clinical application and conclude with future perspectives. We are hopeful that this review will both serve as a guide for prospective researchers seeking to use existing tools for scientific discoveries and inspire the optimization or development of novel tools to enhance precision, ultimately expediting advancements in immunotherapy that improve patient survival and quality of life.
Collapse
Affiliation(s)
- Cora A Ricker
- Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Kevin Meli
- Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | | |
Collapse
|
32
|
Maden SK, Kwon SH, Huuki-Myers LA, Collado-Torres L, Hicks SC, Maynard KR. Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets. Genome Biol 2023; 24:288. [PMID: 38098055 PMCID: PMC10722720 DOI: 10.1186/s13059-023-03123-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding disease pathologies. However, several experimental and computational challenges impede transcriptomics-based deconvolution approaches using single-cell/nucleus RNA-seq reference atlases. Cells from the brain and blood have substantially different sizes, total mRNA, and transcriptional activities, and existing approaches may quantify total mRNA instead of cell type proportions. Further, standards are lacking for the use of cell reference atlases and integrative analyses of single-cell and spatial transcriptomics data. We discuss how to approach these key challenges with orthogonal "gold standard" datasets for evaluating deconvolution methods.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA.
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA.
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
33
|
Brotman SM, Oravilahti A, Rosen JD, Alvarez M, Heinonen S, van der Kolk BW, Fernandes Silva L, Perrin HJ, Vadlamudi S, Pylant C, Deochand S, Basta PV, Valone JM, Narain MN, Stringham HM, Boehnke M, Kuusisto J, Love MI, Pietiläinen KH, Pajukanta P, Laakso M, Mohlke KL. Cell-Type Composition Affects Adipose Gene Expression Associations With Cardiometabolic Traits. Diabetes 2023; 72:1707-1718. [PMID: 37647564 PMCID: PMC10588284 DOI: 10.2337/db23-0365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023]
Abstract
Understanding differences in adipose gene expression between individuals with different levels of clinical traits may reveal the genes and mechanisms leading to cardiometabolic diseases. However, adipose is a heterogeneous tissue. To account for cell-type heterogeneity, we estimated cell-type proportions in 859 subcutaneous adipose tissue samples with bulk RNA sequencing (RNA-seq) using a reference single-nuclear RNA-seq data set. Cell-type proportions were associated with cardiometabolic traits; for example, higher macrophage and adipocyte proportions were associated with higher and lower BMI, respectively. We evaluated cell-type proportions and BMI as covariates in tests of association between >25,000 gene expression levels and 22 cardiometabolic traits. For >95% of genes, the optimal, or best-fit, models included BMI as a covariate, and for 79% of associations, the optimal models also included cell type. After adjusting for the optimal covariates, we identified 2,664 significant associations (P ≤ 2e-6) for 1,252 genes and 14 traits. Among genes proposed to affect cardiometabolic traits based on colocalized genome-wide association study and adipose expression quantitative trait locus signals, 25 showed a corresponding association between trait and gene expression levels. Overall, these results suggest the importance of modeling cell-type proportion when identifying gene expression associations with cardiometabolic traits. ARTICLE HIGHLIGHTS
Collapse
Affiliation(s)
- Sarah M. Brotman
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| | - Anniina Oravilahti
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Jonathan D. Rosen
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| | - Marcus Alvarez
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, CA
| | - Sini Heinonen
- Obesity Research Unit, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Birgitta W. van der Kolk
- Obesity Research Unit, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Lilian Fernandes Silva
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Hannah J. Perrin
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| | | | - Cortney Pylant
- Department of Epidemiology, The University of North Carolina, Chapel Hill, NC
| | - Sonia Deochand
- Department of Epidemiology, The University of North Carolina, Chapel Hill, NC
| | - Patricia V. Basta
- Department of Epidemiology, The University of North Carolina, Chapel Hill, NC
| | - Jordan M. Valone
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
- UNC Neuroscience Center, The University of North Carolina, Chapel Hill, NC
| | - Morgan N. Narain
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
- Curriculum of Toxicology and Environmental Medicine, The University of North Carolina, Chapel Hill, NC
| | - Heather M. Stringham
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Johanna Kuusisto
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
- Department of Medicine, Kuopio University Hospital, Kuopio, Finland
| | - Michael I. Love
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
- Department of Biostatistics, The University of North Carolina, Chapel Hill, NC
| | - Kirsi H. Pietiläinen
- Obesity Research Unit, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- HealthyWeightHub, Endocrinology, Abdominal Center, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
| | - Päivi Pajukanta
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, CA
- Institute for Precision Health, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
- Department of Medicine, Kuopio University Hospital, Kuopio, Finland
| | - Karen L. Mohlke
- Department of Genetics, The University of North Carolina, Chapel Hill, NC
| |
Collapse
|
34
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.27.559579. [PMID: 37808714 PMCID: PMC10557724 DOI: 10.1101/2023.09.27.559579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Real-world clinical samples are often admixtures of signal mosaics from multiple pure cell types. Using computational tools, bulk transcriptomics can be deconvoluted to solve for the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, which ignores person-to-person heterogeneity. Here we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. imply can borrow information across repeatedly measured samples for each subject, and obtain precise cell type proportion estimations. Simulation studies demonstrate reduced bias in cell type abundance estimation compared with existing methods. Real data analyses on large longitudinal consortia show more realistic deconvolution results that align with biological facts. Our results suggest that disparities in cell type proportions are associated with several disease phenotypes in type 1 diabetes and Parkinson's disease. Our proposed tool imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/.
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R. Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| |
Collapse
|
35
|
Yi H, Lin Y, Chang Q, Jin W. A fast and globally optimal solution for RNA-seq quantification. Brief Bioinform 2023; 24:bbad298. [PMID: 37595963 DOI: 10.1093/bib/bbad298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/25/2023] [Accepted: 07/31/2023] [Indexed: 08/20/2023] Open
Abstract
Alignment-based RNA-seq quantification methods typically involve a time-consuming alignment process prior to estimating transcript abundances. In contrast, alignment-free RNA-seq quantification methods bypass this step, resulting in significant speed improvements. Existing alignment-free methods rely on the Expectation-Maximization (EM) algorithm for estimating transcript abundances. However, EM algorithms only guarantee locally optimal solutions, leaving room for further accuracy improvement by finding a globally optimal solution. In this study, we present TQSLE, the first alignment-free RNA-seq quantification method that provides a globally optimal solution for transcript abundances estimation. TQSLE adopts a two-step approach: first, it constructs a k-mer frequency matrix A for the reference transcriptome and a k-mer frequency vector b for the RNA-seq reads; then, it directly estimates transcript abundances by solving the linear equation ATAx = ATb. We evaluated the performance of TQSLE using simulated and real RNA-seq data sets and observed that, despite comparable speed to other alignment-free methods, TQSLE outperforms them in terms of accuracy. TQSLE is freely available at https://github.com/yhg926/TQSLE.
Collapse
Affiliation(s)
- Huiguang Yi
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 97 Buxin Rd, Shenzhen, 518000, Guangdong, China
- School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Blvd, Shenzhen 518055, Guangdong, China
| | - Yanling Lin
- School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Blvd, Shenzhen 518055, Guangdong, China
| | - Qing Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 97 Buxin Rd, Shenzhen, 518000, Guangdong, China
| | - Wenfei Jin
- School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Blvd, Shenzhen 518055, Guangdong, China
| |
Collapse
|
36
|
O'Neill NK, Stein TD, Hu J, Rehman H, Campbell JD, Yajima M, Zhang X, Farrer LA. Bulk brain tissue cell-type deconvolution with bias correction for single-nuclei RNA sequencing data using DeTREM. BMC Bioinformatics 2023; 24:349. [PMID: 37726653 PMCID: PMC10507917 DOI: 10.1186/s12859-023-05476-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 09/12/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND Quantifying cell-type abundance in bulk tissue RNA-sequencing enables researchers to better understand complex systems. Newer deconvolution methodologies, such as MuSiC, use cell-type signatures derived from single-cell RNA-sequencing (scRNA-seq) data to make these calculations. Single-nuclei RNA-sequencing (snRNA-seq) reference data can be used instead of scRNA-seq data for tissues such as human brain where single-cell data are difficult to obtain, but accuracy suffers due to sequencing differences between the technologies. RESULTS We propose a modification to MuSiC entitled 'DeTREM' which compensates for sequencing differences between the cell-type signature and bulk RNA-seq datasets in order to better predict cell-type fractions. We show DeTREM to be more accurate than MuSiC in simulated and real human brain bulk RNA-sequencing datasets with various cell-type abundance estimates. We also compare DeTREM to SCDC and CIBERSORTx, two recent deconvolution methods that use scRNA-seq cell-type signatures. We find that they perform well in simulated data but produce less accurate results than DeTREM when used to deconvolute human brain data. CONCLUSION DeTREM improves the deconvolution accuracy of MuSiC and outperforms other deconvolution methods when applied to snRNA-seq data. DeTREM enables accurate cell-type deconvolution in situations where scRNA-seq data are not available. This modification improves characterization cell-type specific effects in brain tissue and identification of cell-type abundance differences under various conditions.
Collapse
Affiliation(s)
- Nicholas K O'Neill
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Medicine (Biomedical Genetics), Boston University, Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Thor D Stein
- Department of Pathology and Laboratory Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Veterans Administration Medical Center, Bedford, MA, USA
| | - Junming Hu
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Medicine (Biomedical Genetics), Boston University, Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Habbiburr Rehman
- Department of Medicine (Biomedical Genetics), Boston University, Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Joshua D Campbell
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Medicine (Computational Biomedicine), Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Masanao Yajima
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| | - Xiaoling Zhang
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Department of Medicine (Biomedical Genetics), Boston University, Chobanian & Avedisian School of Medicine, Boston, MA, USA.
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
| | - Lindsay A Farrer
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Department of Medicine (Biomedical Genetics), Boston University, Chobanian & Avedisian School of Medicine, Boston, MA, USA.
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
- Department of Ophthalmology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA.
| |
Collapse
|
37
|
Wang J, Lu L, Zheng S, Wang D, Jin L, Zhang Q, Li M, Zhang Z. DeCOOC Deconvoluted Hi-C Map Characterizes the Chromatin Architecture of Cells in Physiologically Distinctive Tissues. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2301058. [PMID: 37515382 PMCID: PMC10520690 DOI: 10.1002/advs.202301058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/06/2023] [Indexed: 07/30/2023]
Abstract
Deciphering variations in chromosome conformations based on bulk three-dimensional (3D) genomic data from heterogenous tissues is a key to understanding cell-type specific genome architecture and dynamics. Surprisingly, computational deconvolution methods for high-throughput chromosome conformation capture (Hi-C) data remain very rare in the literature. Here, a deep convolutional neural network (CNN), deconvolve bulk Hi-C data (deCOOC) that remarkably outperformed all the state-of-the-art tools in the deconvolution task is developed. Interestingly, it is noticed that the chromatin accessibility or the Hi-C contact frequency alone is insufficient to explain the power of deCOOC, suggesting the existence of a latent embedded layer of information pertaining to the cell type specific 3D genome architecture. By applying deCOOC to in-house-generated bulk Hi-C data from visceral and subcutaneous adipose tissues, it is found that the characteristic chromatin features of M2 cells in the two anatomical loci are distinctively bound to different physiological functionalities. Taken together, deCOOC is both a reliable Hi-C data deconvolution method and a powerful tool for functional extraction of 3D genome architecture.
Collapse
Affiliation(s)
- Junmei Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| | - Lu Lu
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Shiqi Zheng
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| | - Danyang Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
- Sars‐Fang Centre & MOE Key Laboratory of Marine Genetics and BreedingCollege of Marine Life SciencesOcean University of ChinaQingdao266100China
| | - Long Jin
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Qing Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
| | - Mingzhou Li
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| |
Collapse
|
38
|
Sinning J, Funk ND, Soerensen-Zender I, Wulfmeyer VC, Liao CM, Haller H, Hinze C, Schmidt-Ott KM, Melk A, Schmitt R. The aging kidney is characterized by tubuloinflammaging, a phenotype associated with MHC-II gene expression. Front Immunol 2023; 14:1222339. [PMID: 37675124 PMCID: PMC10477980 DOI: 10.3389/fimmu.2023.1222339] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 08/01/2023] [Indexed: 09/08/2023] Open
Abstract
Introduction Even during physiologic aging, the kidney experiences a loss of mass and a progressive functional decline. This is clinically relevant as it leads to an increased risk of acute and chronic kidney disease. The kidney tubular system plays an important role in the underlying aging process, but the involved cellular mechanisms remain largely elusive. Methods Kidneys of 3-, 12- and 24-month-old male C57BL/6J mice were used for RNA sequencing, histological examination, immunostaining and RNA-in-situ-hybridization. Single cell RNA sequencing data of differentially aged murine and human kidneys was analyzed to identify age-dependent expression patterns in tubular epithelial cells. Senescent and non-senescent primary tubular epithelial cells from mouse kidney were used for in vitro experiments. Results During normal kidney aging, tubular cells adopt an inflammatory phenotype, characterized by the expression of MHC class II related genes. In our analysis of bulk and single cell transcriptional data we found that subsets of tubular cells show an age-related expression of Cd74, H2-Eb1 and H2-Ab1 in mice and CD74, HLA-DQB1 and HLADRB1 in humans. Expression of MHC class II related genes was associated with a phenotype of tubular cell senescence, and the selective elimination of senescent cells reversed the phenotype. Exposure to the Cd74 ligand MIF promoted a prosenescent phenotype in tubular cell cultures. Discussion Together, these data suggest that during normal renal aging tubular cells activate a program of 'tubuloinflammaging', which might contribute to age-related phenotypical changes and to increased disease susceptibility.
Collapse
Affiliation(s)
- Julius Sinning
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| | - Nils David Funk
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| | - Inga Soerensen-Zender
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| | | | - Chieh Ming Liao
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| | - Hermann Haller
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| | - Christian Hinze
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| | | | - Anette Melk
- Department of Pediatric Kidney, Liver and Metabolic Diseases, Hannover Medical School, Hannover, Germany
| | - Roland Schmitt
- Department of Nephrology and Hypertension, Hannover Medical School, Hannover, Germany
| |
Collapse
|
39
|
Berson E, Sreenivas A, Phongpreecha T, Perna A, Grandi FC, Xue L, Ravindra NG, Payrovnaziri N, Mataraso S, Kim Y, Espinosa C, Chang AL, Becker M, Montine KS, Fox EJ, Chang HY, Corces MR, Aghaeepour N, Montine TJ. Whole genome deconvolution unveils Alzheimer's resilient epigenetic signature. Nat Commun 2023; 14:4947. [PMID: 37587197 PMCID: PMC10432546 DOI: 10.1038/s41467-023-40611-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 08/03/2023] [Indexed: 08/18/2023] Open
Abstract
Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) accurately depicts the chromatin regulatory state and altered mechanisms guiding gene expression in disease. However, bulk sequencing entangles information from different cell types and obscures cellular heterogeneity. To address this, we developed Cellformer, a deep learning method that deconvolutes bulk ATAC-seq into cell type-specific expression across the whole genome. Cellformer enables cost-effective cell type-specific open chromatin profiling in large cohorts. Applied to 191 bulk samples from 3 brain regions, Cellformer identifies cell type-specific gene regulatory mechanisms involved in resilience to Alzheimer's disease, an uncommon group of cognitively healthy individuals that harbor a high pathological load of Alzheimer's disease. Cell type-resolved chromatin profiling unveils cell type-specific pathways and nominates potential epigenetic mediators underlying resilience that may illuminate therapeutic opportunities to limit the cognitive impact of the disease. Cellformer is freely available to facilitate future investigations using high-throughput bulk ATAC-seq data.
Collapse
Affiliation(s)
- Eloise Berson
- Department of Pathology, Stanford University, Stanford, CA, USA.
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Anjali Sreenivas
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Thanaphong Phongpreecha
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Amalia Perna
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Fiorella C Grandi
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Lei Xue
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Neal G Ravindra
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Neelufar Payrovnaziri
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Samson Mataraso
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Yeasul Kim
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Alan L Chang
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | | | - Edward J Fox
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
40
|
Ghaffari S, Bouchonville KJ, Saleh E, Schmidt RE, Offer SM, Sinha S. BEDwARS: a robust Bayesian approach to bulk gene expression deconvolution with noisy reference signatures. Genome Biol 2023; 24:178. [PMID: 37537644 PMCID: PMC10399072 DOI: 10.1186/s13059-023-03007-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/05/2023] [Indexed: 08/05/2023] Open
Abstract
Differential gene expression in bulk transcriptomics data can reflect change of transcript abundance within a cell type and/or change in the proportions of cell types. Expression deconvolution methods can help differentiate these scenarios. BEDwARS is a Bayesian deconvolution method designed to address differences between reference signatures of cell types and corresponding true signatures underlying bulk transcriptomic profiles. BEDwARS is more robust to noisy reference signatures and outperforms leading in-class methods for estimating cell type proportions and signatures. Application of BEDwARS to dihydropyridine dehydrogenase deficiency identified the possible involvement of ciliopathy and impaired translational control in the etiology of the disorder.
Collapse
Affiliation(s)
- Saba Ghaffari
- Department of Computer Science, University of Illinois at Urbana-Champaign, Thomas M. Siebel Center, 201 N. Goodwin Ave., Urbana, IL, USA
| | - Kelly J Bouchonville
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St. SW, Rochester, MN, 55905, USA
| | - Ehsan Saleh
- Department of Computer Science, University of Illinois at Urbana-Champaign, Thomas M. Siebel Center, 201 N. Goodwin Ave., Urbana, IL, USA
| | - Remington E Schmidt
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St. SW, Rochester, MN, 55905, USA
| | - Steven M Offer
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St. SW, Rochester, MN, 55905, USA.
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering at Georgia Tech and Emory University, Georgia Institute of Technology, 3108 U.A. Whitaker Bldg., 313 Ferst Drive, Atlanta, GA, 30332, USA.
| |
Collapse
|
41
|
Hedman ÅK, Winter E, Yoosuf N, Benita Y, Berg L, Brynedal B, Folkersen L, Klareskog L, Maciejewski M, Sirota-Madi A, Spector Y, Ziemek D, Padyukov L, Shen-Orr SS, Jelinsky SA. Peripheral blood cellular dynamics of rheumatoid arthritis treatment informs about efficacy of response to disease modifying drugs. Sci Rep 2023; 13:10058. [PMID: 37344505 DOI: 10.1038/s41598-023-36999-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/14/2023] [Indexed: 06/23/2023] Open
Abstract
Rheumatoid arthritis (RA) is an autoimmune disease characterized by systemic inflammation and is mediated by multiple immune cell types. In this work, we aimed to determine the relevance of changes in cell proportions in peripheral blood mononuclear cells (PBMCs) during the development of disease and following treatment. Samples from healthy blood donors, newly diagnosed RA patients, and established RA patients that had an inadequate response to MTX and were about to start tumor necrosis factor inhibitors (TNFi) treatment were collected before and after 3 months of treatment. We used in parallel a computational deconvolution approach based on RNA expression and flow cytometry to determine the relative cell-type frequencies. Cell-type frequencies from deconvolution of gene expression indicate that monocytes (both classical and non-classical) and CD4+ cells (Th1 and Th2) were increased in RA patients compared to controls, while NK cells and B cells (naïve and mature) were significantly decreased in RA patients. Treatment with MTX caused a decrease in B cells (memory and plasma cell), and a decrease in CD4 Th cells (Th1 and Th17), while treatment with TNFi resulted in a significant increase in the population of B cells. Characterization of the RNA expression patterns found that most of the differentially expressed genes in RA subjects after treatment can be explained by changes in cell frequencies (98% and 74% respectively for MTX and TNFi).
Collapse
Affiliation(s)
- Åsa K Hedman
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
- Department of Inflammation and Immunology, Pfizer, 1 Portland Street, Cambridge, MA, 02139, USA
| | | | - Niyaz Yoosuf
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | | | - Louise Berg
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Boel Brynedal
- Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Lasse Folkersen
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Lars Klareskog
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Mateusz Maciejewski
- Department of Inflammation and Immunology, Pfizer, 1 Portland Street, Cambridge, MA, 02139, USA
| | | | | | - Daniel Ziemek
- Department of Inflammation and Immunology, Pfizer, Berlin, Germany
| | - Leonid Padyukov
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Shai S Shen-Orr
- CytoReason, Tel-Aviv, Israel
- Technion-Israel Institute of Technology, Haifa, Israel
| | - Scott A Jelinsky
- Department of Inflammation and Immunology, Pfizer, 1 Portland Street, Cambridge, MA, 02139, USA.
| |
Collapse
|
42
|
Wu H, Eckhardt CM, Baccarelli AA. Molecular mechanisms of environmental exposures and human disease. Nat Rev Genet 2023; 24:332-344. [PMID: 36717624 PMCID: PMC10562207 DOI: 10.1038/s41576-022-00569-3] [Citation(s) in RCA: 97] [Impact Index Per Article: 48.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2022] [Indexed: 02/01/2023]
Abstract
A substantial proportion of disease risk for common complex disorders is attributable to environmental exposures and pollutants. An appreciation of how environmental pollutants act on our cells to produce deleterious health effects has led to advances in our understanding of the molecular mechanisms underlying the pathogenesis of chronic diseases, including cancer and cardiovascular, neurodegenerative and respiratory diseases. Here, we discuss emerging research on the interplay of environmental pollutants with the human genome and epigenome. We review evidence showing the environmental impact on gene expression through epigenetic modifications, including DNA methylation, histone modification and non-coding RNAs. We also highlight recent studies that evaluate recently discovered molecular processes through which the environment can exert its effects, including extracellular vesicles, the epitranscriptome and the mitochondrial genome. Finally, we discuss current challenges when studying the exposome - the cumulative measure of environmental influences over the lifespan - and its integration into future environmental health research.
Collapse
Affiliation(s)
- Haotian Wu
- Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, NY, USA
| | - Christina M Eckhardt
- Department of Pulmonary, Allergy and Critical Care Medicine, Columbia University College of Physicians and Surgeons, New York, NY, USA
| | - Andrea A Baccarelli
- Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York, NY, USA.
| |
Collapse
|
43
|
Hicks EM, Seah C, Cote A, Marchese S, Brennand KJ, Nestler EJ, Girgenti MJ, Huckins LM. Integrating genetics and transcriptomics to study major depressive disorder: a conceptual framework, bioinformatic approaches, and recent findings. Transl Psychiatry 2023; 13:129. [PMID: 37076454 PMCID: PMC10115809 DOI: 10.1038/s41398-023-02412-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 04/21/2023] Open
Abstract
Major depressive disorder (MDD) is a complex and heterogeneous psychiatric syndrome with genetic and environmental influences. In addition to neuroanatomical and circuit-level disturbances, dysregulation of the brain transcriptome is a key phenotypic signature of MDD. Postmortem brain gene expression data are uniquely valuable resources for identifying this signature and key genomic drivers in human depression; however, the scarcity of brain tissue limits our capacity to observe the dynamic transcriptional landscape of MDD. It is therefore crucial to explore and integrate depression and stress transcriptomic data from numerous, complementary perspectives to construct a richer understanding of the pathophysiology of depression. In this review, we discuss multiple approaches for exploring the brain transcriptome reflecting dynamic stages of MDD: predisposition, onset, and illness. We next highlight bioinformatic approaches for hypothesis-free, genome-wide analyses of genomic and transcriptomic data and their integration. Last, we summarize the findings of recent genetic and transcriptomic studies within this conceptual framework.
Collapse
Affiliation(s)
- Emily M Hicks
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Carina Seah
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Alanna Cote
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Shelby Marchese
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Kristen J Brennand
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06511, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA
| | - Eric J Nestler
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA.
| | - Laura M Huckins
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA.
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA.
| |
Collapse
|
44
|
Revkov E, Kulshrestha T, Sung KWK, Skanderup AJ. PUREE: accurate pan-cancer tumor purity estimation from gene expression data. Commun Biol 2023; 6:394. [PMID: 37041233 PMCID: PMC10090153 DOI: 10.1038/s42003-023-04764-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 03/27/2023] [Indexed: 04/13/2023] Open
Abstract
Tumors are complex masses composed of malignant and non-malignant cells. Variation in tumor purity (proportion of cancer cells in a sample) can both confound integrative analysis and enable studies of tumor heterogeneity. Here we developed PUREE, which uses a weakly supervised learning approach to infer tumor purity from a tumor gene expression profile. PUREE was trained on gene expression data and genomic consensus purity estimates from 7864 solid tumor samples. PUREE predicted purity with high accuracy across distinct solid tumor types and generalized to tumor samples from unseen tumor types and cohorts. Gene features of PUREE were further validated using single-cell RNA-seq data from distinct tumor types. In a comprehensive benchmark, PUREE outperformed existing transcriptome-based purity estimation approaches. Overall, PUREE is a highly accurate and versatile method for estimating tumor purity and interrogating tumor heterogeneity from bulk tumor gene expression data, which can complement genomics-based approaches or be used in settings where genomic data is unavailable.
Collapse
Affiliation(s)
- Egor Revkov
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
- School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 117417, Republic of Singapore
| | - Tanmay Kulshrestha
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Ken Wing-Kin Sung
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
- School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 117417, Republic of Singapore
| | - Anders Jacobsen Skanderup
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 117417, Republic of Singapore.
- National Cancer Centre Singapore, Division of Medical Oncology, 30 Hospital Boulevard, Singapore, 168583, Republic of Singapore.
| |
Collapse
|
45
|
Wang S, Pan W, Mi WX, Wang SH. Sex-specific gene expression patterns in head and neck squamous cell carcinomas. Heliyon 2023; 9:e14890. [PMID: 37064442 PMCID: PMC10102211 DOI: 10.1016/j.heliyon.2023.e14890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 03/15/2023] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
Objective The head and neck squamous cell carcinomas (HNSCCs) have higher incidence rates in men, but the reasons are still obscure. This study aimed to investigate the sex-specific gene expression patterns and predict the regulatory mechanisms. Design Data including clinical, survival, RNA-seq, miRNA, and methylation information were derived from The Cancer Genome Atlas (TCGA). A total of 131 paired male and female cases were included based on propensity score matching. We concentrated on the prognostic values of the sex-specific pathways enriched by differentially expressed genes (DEGs) and predicted the potential regulatory mechanisms from immune cell infiltration, ceRNA regulatory network, methylation, and differential coexpression analysis. Results Compared with females, males exhibited a lower activity of immune-related functions and higher activities of mitochondrial and ubiquitination functions. The pathway activities were associated with the prognosis of males but less relevant to females. We extracted eight pathways with sex-biased survival patterns, of which five were about down-regulated immune functions, and three were up-regulated pathways (GTP biosynthetic, DNA polymerase, and spliceosomal complex assembly). The five immune pathways were moderately or strongly correlated with the proportion of macrophages. We identified six over-expressed lncRNAs that might be involved in the regulation of the three up-regulated pathways. These lncRNAs exhibited a lower methylation density in males, which might account for their over-expression. Conclusions For HNSCCs, males were characterized by immunosuppression. It was a sign of unfavorable prognosis and might be associated the proportion of macrophages. LncRNAs and methylation might be involved in the regulation of these pathways.
Collapse
Affiliation(s)
- Shuo Wang
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Wei Pan
- Institute for Macromolecular Chemistry, University of Freiburg, Stefan-Meier-Str. 31, 79104 Freiburg, Germany
| | - Wen-xiang Mi
- Department of Stomatology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| | - Shao-hai Wang
- Department of Stomatology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
- Corresponding author. Department of Stomatology, Shanghai East Hospital, Tongji University School of Medicine, 150 Jimo Road, Shanghai 200120, China.
| |
Collapse
|
46
|
Zheng Y, Yang X. Spatial RNA sequencing methods show high resolution of single cell in cancer metastasis and the formation of tumor microenvironment. Biosci Rep 2023; 43:BSR20221680. [PMID: 36459212 PMCID: PMC9950536 DOI: 10.1042/bsr20221680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/30/2022] [Accepted: 12/02/2022] [Indexed: 12/03/2022] Open
Abstract
Cancer metastasis often leads to death and therapeutic resistance. This process involves the participation of a variety of cell components, especially cellular and intercellular communications in the tumor microenvironment (TME). Using genetic sequencing technology to comprehensively characterize the tumor and TME is therefore key to understanding metastasis and therapeutic resistance. The use of spatial transcriptome sequencing enables the localization of gene expressions and cell activities in tissue sections. By examining the localization change as well as gene expression of these cells, it is possible to characterize the progress of tumor metastasis and TME formation. With improvements of this technology, spatial transcriptome sequencing technology has been extended from local regions to whole tissues, and from single sequencing technology to multimodal analysis combined with a variety of datasets. This has enabled the detection of every single cell in tissue slides, with high resolution, to provide more accurate predictive information for tumor treatments. In this review, we summarize the results of recent studies dealing with new multimodal methods and spatial transcriptome sequencing methods in tumors to illustrate recent developments in the imaging resolution of micro-tissues.
Collapse
Affiliation(s)
- Yue Zheng
- Department of Biochemistry and Molecular Biology, Basic Medical College, Shanxi Medical University, No. 56, Xinjiang South Road, Yingze street, Yingze District, Taiyuan City, Shanxi Province 030000, China
| | - Xiaofeng Yang
- Department of Urology, First Hospital of Shanxi Medical University, No. 85, Jiefang South Road, Yingze street, Yingze District, Taiyuan City, Shanxi Province 030000, China
| |
Collapse
|
47
|
Riggins TE, Whitsitt QA, Saxena A, Hunter E, Hunt B, Thompson CH, Moore MG, Purcell EK. Gene Expression Changes in Cultured Reactive Rat Astrocyte Models and Comparison to Device-Associated Effects in the Brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.06.522870. [PMID: 36712012 PMCID: PMC9881929 DOI: 10.1101/2023.01.06.522870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Implanted microelectrode arrays hold immense therapeutic potential for many neurodegenerative diseases. However, a foreign body response limits long-term device performance. Recent literature supports the role of astrocytes in the response to damage to the central nervous system (CNS) and suggests that reactive astrocytes exist on a spectrum of phenotypes, from beneficial to neurotoxic. The goal of our study was to gain insight into the subtypes of reactive astrocytes responding to electrodes implanted in the brain. In this study, we tested the transcriptomic profile of two reactive astrocyte culture models (cytokine cocktail or lipopolysaccharide, LPS) utilizing RNA sequencing, which we then compared to differential gene expression surrounding devices inserted into rat motor cortex via spatial transcriptomics. We interpreted changes in the genetic expression of the culture models to that of 24 hour, 1 week and 6 week rat tissue samples at multiple distances radiating from the injury site. We found overlapping expression of up to ∼250 genes between in vitro models and in vivo effects, depending on duration of implantation. Cytokine-induced cells shared more genes in common with chronically implanted tissue (≥1 week) in comparison to LPS-exposed cells. We revealed localized expression of a subset of these intersecting genes (e.g., Serping1, Chi3l1, and Cyp7b1) in regions of device-encapsulating, glial fibrillary acidic protein (GFAP)-expressing astrocytes identified with immunohistochemistry. We applied a factorization approach to assess the strength of the relationship between reactivity markers and the spatial distribution of GFAP-expressing astrocytes in vivo . We also provide lists of hundreds of differentially expressed genes between reactive culture models and untreated controls, and we observed 311 shared genes between the cytokine induced model and the LPS-reaction induced control model. Our results show that comparisons of reactive astrocyte culture models with spatial transcriptomics data can reveal new biomarkers of the foreign body response to implantable neurotechnology. These comparisons also provide a strategy to assess the development of in vitro models of the tissue response to implanted electrodes.
Collapse
|
48
|
Teh RQ, Liu GS, Wang JH. Bioinformatics Tools for Bulk Gene Expression Deconvolution in Diabetic Retinopathy. Methods Mol Biol 2023; 2678:107-115. [PMID: 37326707 DOI: 10.1007/978-1-0716-3255-0_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Retinal neovascularization is one of the leading causes of vision loss and a hallmark of proliferative diabetic retinopathy (PDR). The immune system is observed to be involved in the pathogenesis of diabetic retinopathy (DR). The specific immune cell type that contributes to retinal neovascularization can be identified via a bioinformatics analysis of RNA sequencing (RNA-seq) data, known as deconvolution analysis. Previous study has identified the infiltration of macrophages in the retina of rats with hypoxia-induced retinal neovascularization and patients with PDR through a deconvolution algorithm, known as CIBERSORTx. Here, we describe the protocols of using CIBERSORTx to perform the deconvolution analysis and downstream analysis of RNA-seq data.
Collapse
Affiliation(s)
- Ru Qi Teh
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia
| | - Guei-Sheung Liu
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia.
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia.
- Ophthalmology, Department of Surgery, University of Melbourne, East Melbourne, VIC, Australia.
| | - Jiang-Hui Wang
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia.
| |
Collapse
|
49
|
Luo Y, Fan R. Deconvolution analysis of cell-type expression from bulk tissues by integrating with single-cell expression reference. Genet Epidemiol 2022; 46:615-628. [PMID: 35788983 PMCID: PMC9669104 DOI: 10.1002/gepi.22494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/22/2022] [Accepted: 05/16/2022] [Indexed: 11/06/2022]
Abstract
To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell-type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell-type-specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk-profiled samples by using reference single-cell (sc) RNA-sequencing (RNA-seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed-effect models provide similar results as an elegant machine learning algorithm named cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well-known and reliable procedure to reconstruct cell-type abundances and cell-type-specific gene expression profiles. In real data analysis, the mixed-effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA-seq data, it is better to use mixed models with heterogeneous mean and variance-covariance. As a byproduct, the mixed models provide fractions of covariance between subject-specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA-seq data.
Collapse
Affiliation(s)
- Yutong Luo
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC 20057
| | - Ruzong Fan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC 20057
- Computational and Statistical Genomics Branch, National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Baltimore, MD 21224
| |
Collapse
|
50
|
Okereke LC, Bello AU, Onwukwe EA. Toward Precision Radiotherapy: A Nonlinear Optimization Framework and an Accelerated Machine Learning Algorithm for the Deconvolution of Tumor-Infiltrating Immune Cells. Cells 2022; 11:cells11223604. [PMID: 36429031 PMCID: PMC9688486 DOI: 10.3390/cells11223604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 11/03/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022] Open
Abstract
Tumor-infiltrating immune cells (TIICs) form a critical part of the ecosystem surrounding a cancerous tumor. Recent advances in radiobiology have shown that, in addition to damaging cancerous cells, radiotherapy drives the upregulation of immunosuppressive and immunostimulatory TIICs, which in turn impacts treatment response. Quantifying TIICs in tumor samples could form an important predictive biomarker guiding patient stratification and the design of radiotherapy regimens and combined immune-radiation treatments. As a result of several limitations associated with experimental methods for quantifying TIICs and the availability of extensive gene sequencing data, deconvolution-based computational methods have appeared as a suitable alternative for quantifying TIICs. Accordingly, we introduce and discuss a nonlinear regression approach (remarkably different from the traditional linear modeling approach of current deconvolution-based methods) and a machine learning algorithm for approximating the solution of the resulting constrained optimization problem. This way, the deconvolution problem is treated naturally, given that the gene expression levels of pure and heterogenous samples do not have a strictly linear relationship. When applied across transcriptomics datasets, our approach, which also allows the coupling of different loss functions, yields results that closely match ground-truth values from experimental methods and exhibits superior performance over popular deconvolution-based methods.
Collapse
Affiliation(s)
- Lois Chinwendu Okereke
- Department of Pure and Applied Mathematics, Mathematics Institute (Emerging Regional Centre of Excellence (ERCE) of the European Mathematical Society (EMS)), African University of Science and Technology, Abuja 900107, Nigeria
- Correspondence:
| | - Abdulmalik Usman Bello
- Department of Pure and Applied Mathematics, Mathematics Institute (Emerging Regional Centre of Excellence (ERCE) of the European Mathematical Society (EMS)), African University of Science and Technology, Abuja 900107, Nigeria
- Department of Mathematics, Federal University Dutsin-Ma, Dutsin-Ma 821101, Nigeria
| | - Emmanuel Akwari Onwukwe
- Department of Theoretical and Applied Physics, African University of Science and Technology, Abuja 900107, Nigeria
- Inspired Innovative Sustainable (IIS) Projects & Solutions Limited, Abuja 900107, Nigeria
| |
Collapse
|