1
|
Hoffmann M, Willruth LL, Dietrich A, Lee HK, Knabl L, Trummer N, Baumbach J, Furth PA, Hennighausen L, List M. Blood transcriptomics analysis offers insights into variant-specific immune response to SARS-CoV-2. Sci Rep 2024; 14:2808. [PMID: 38307916 PMCID: PMC10837437 DOI: 10.1038/s41598-024-53117-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 01/28/2024] [Indexed: 02/04/2024] Open
Abstract
Bulk RNA sequencing (RNA-seq) of blood is typically used for gene expression analysis in biomedical research but is still rarely used in clinical practice. In this study, we propose that RNA-seq should be considered a diagnostic tool, as it offers not only insights into aberrant gene expression and splicing but also delivers additional readouts on immune cell type composition as well as B-cell and T-cell receptor (BCR/TCR) repertoires. We demonstrate that RNA-seq offers insights into a patient's immune status via integrative analysis of RNA-seq data from patients infected with various SARS-CoV-2 variants (in total 196 samples with up to 200 million reads sequencing depth). We compare the results of computational cell-type deconvolution methods (e.g., MCP-counter, xCell, EPIC, quanTIseq) to complete blood count data, the current gold standard in clinical practice. We observe varying levels of lymphocyte depletion and significant differences in neutrophil levels between SARS-CoV-2 variants. Additionally, we identify B and T cell receptor (BCR/TCR) sequences using the tools MiXCR and TRUST4 to show that-combined with sequence alignments and BLASTp-they could be used to classify a patient's disease. Finally, we investigated the sequencing depth required for such analyses and concluded that 10 million reads per sample is sufficient. In conclusion, our study reveals that computational cell-type deconvolution and BCR/TCR methods using bulk RNA-seq analyses can supplement missing CBC data and offer insights into immune responses, disease severity, and pathogen-specific immunity, all achievable with a sequencing depth of 10 million reads per sample.
Collapse
Affiliation(s)
- Markus Hoffmann
- Data Science in Systems Biomedicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Institute for Advanced Study, Technical University of Munich, Lichtenbergstrasse 2 a, 85748, Garching, Germany.
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD, 20892, USA.
| | - Lina-Liv Willruth
- Data Science in Systems Biomedicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Alexander Dietrich
- Data Science in Systems Biomedicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Hye Kyung Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD, 20892, USA
| | | | - Nico Trummer
- Data Science in Systems Biomedicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Priscilla A Furth
- Institute for Advanced Study, Technical University of Munich, Lichtenbergstrasse 2 a, 85748, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD, 20892, USA
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
| | - Lothar Hennighausen
- Institute for Advanced Study, Technical University of Munich, Lichtenbergstrasse 2 a, 85748, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD, 20892, USA
| | - Markus List
- Data Science in Systems Biomedicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
2
|
Du ZH, Hu WL, Li JQ, Shang X, You ZH, Chen ZZ, Huang YA. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data. Commun Biol 2023; 6:1268. [PMID: 38097699 PMCID: PMC10721875 DOI: 10.1038/s42003-023-05634-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Wei-Lin Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhuang-Zhuang Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
3
|
Hoffmann M, Willruth LL, Dietrich A, Lee HK, Knabl L, Trummer N, Baumbach J, Furth PA, Hennighausen L, List M. Blood transcriptomics analysis offers insights into variant-specific immune response to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.564190. [PMID: 38076885 PMCID: PMC10705570 DOI: 10.1101/2023.11.03.564190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Bulk RNA sequencing (RNA-seq) of blood is typically used for gene expression analysis in biomedical research but is still rarely used in clinical practice. In this study, we argue that RNA-seq should be considered a routine diagnostic tool, as it offers not only insights into aberrant gene expression and splicing but also delivers additional readouts on immune cell type composition as well as B-cell and T-cell receptor (BCR/TCR) repertoires. We demonstrate that RNA-seq offers vital insights into a patient's immune status via integrative analysis of RNA-seq data from patients infected with various SARS-CoV-2 variants (in total 240 samples with up to 200 million reads sequencing depth). We compare the results of computational cell-type deconvolution methods (e.g., MCP-counter, xCell, EPIC, quanTIseq) to complete blood count data, the current gold standard in clinical practice. We observe varying levels of lymphocyte depletion and significant differences in neutrophil levels between SARS-CoV-2 variants. Additionally, we identify B and T cell receptor (BCR/TCR) sequences using the tools MiXCR and TRUST4 to show that - combined with sequence alignments and pBLAST - they could be used to classify a patient's disease. Finally, we investigated the sequencing depth required for such analyses and concluded that 10 million reads per sample is sufficient. In conclusion, our study reveals that computational cell-type deconvolution and BCR/TCR methods using bulk RNA-seq analyses can supplement missing CBC data and offer insights into immune responses, disease severity, and pathogen-specific immunity, all achievable with a sequencing depth of 10 million reads per sample.
Collapse
Affiliation(s)
- Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Lina-Liv Willruth
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Alexander Dietrich
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Hye Kyung Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | | | - Nico Trummer
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Priscilla A. Furth
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, United States of America
| | - Lothar Hennighausen
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| |
Collapse
|
4
|
Huang Y, Mohanty V, Dede M, Tsai K, Daher M, Li L, Rezvani K, Chen K. Characterizing cancer metabolism from bulk and single-cell RNA-seq data using METAFlux. Nat Commun 2023; 14:4883. [PMID: 37573313 PMCID: PMC10423258 DOI: 10.1038/s41467-023-40457-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 07/26/2023] [Indexed: 08/14/2023] Open
Abstract
Cells often alter metabolic strategies under nutrient-deprived conditions to support their survival and growth. Characterizing metabolic reprogramming in the tumor microenvironment (TME) is of emerging importance in cancer research and patient care. However, recent technologies only measure a subset of metabolites and cannot provide in situ measurements. Computational methods such as flux balance analysis (FBA) have been developed to estimate metabolic flux from bulk RNA-seq data and can potentially be extended to single-cell RNA-seq (scRNA-seq) data. However, it is unclear how reliable current methods are, particularly in TME characterization. Here, we present a computational framework METAFlux (METAbolic Flux balance analysis) to infer metabolic fluxes from bulk or single-cell transcriptomic data. Large-scale experiments using cell-lines, the cancer genome atlas (TCGA), and scRNA-seq data obtained from diverse cancer and immunotherapeutic contexts, including CAR-NK cell therapy, have validated METAFlux's capability to characterize metabolic heterogeneity and metabolic interaction amongst cell types.
Collapse
Affiliation(s)
- Yuefan Huang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston (UTHealth), Houston, TX, 77030, USA
| | - Vakul Mohanty
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Merve Dede
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Kyle Tsai
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - May Daher
- Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Li Li
- Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Katayoun Rezvani
- Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
5
|
Liu W, Deng W, Chen M, Dong Z, Zhu B, Yu Z, Tang D, Sauler M, Lin C, Wain LV, Cho MH, Kaminski N, Zhao H. A statistical framework to identify cell types whose genetically regulated proportions are associated with complex diseases. PLoS Genet 2023; 19:e1010825. [PMID: 37523391 PMCID: PMC10414598 DOI: 10.1371/journal.pgen.1010825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 08/10/2023] [Accepted: 06/12/2023] [Indexed: 08/02/2023] Open
Abstract
Finding disease-relevant tissues and cell types can facilitate the identification and investigation of functional genes and variants. In particular, cell type proportions can serve as potential disease predictive biomarkers. In this manuscript, we introduce a novel statistical framework, cell-type Wide Association Study (cWAS), that integrates genetic data with transcriptomics data to identify cell types whose genetically regulated proportions (GRPs) are disease/trait-associated. On simulated and real GWAS data, cWAS showed good statistical power with newly identified significant GRP associations in disease-associated tissues. More specifically, GRPs of endothelial and myofibroblasts in lung tissue were associated with Idiopathic Pulmonary Fibrosis and Chronic Obstructive Pulmonary Disease, respectively. For breast cancer, the GRP of blood CD8+ T cells was negatively associated with breast cancer (BC) risk as well as survival. Overall, cWAS is a powerful tool to reveal cell types associated with complex diseases mediated by GRPs.
Collapse
Affiliation(s)
- Wei Liu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Wenxuan Deng
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Ming Chen
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Zihan Dong
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Biqing Zhu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Zhaolong Yu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Daiwei Tang
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Maor Sauler
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, Yale University, New Haven, Connecticut, United States of America
| | - Chen Lin
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Louise V. Wain
- Department of Health Sciences, University of Leicester, Leicester, United Kingdom
- National Institute for Health Research, Leicester Respiratory Biomedical Research Centre, Glenfield Hospital, Leicester, United Kingdom
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, Yale University, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
6
|
Khalili N, Kazerooni AF, Familiar A, Haldar D, Kraya A, Foster J, Koptyra M, Storm PB, Resnick AC, Nabavizadeh A. Radiomics for characterization of the glioma immune microenvironment. NPJ Precis Oncol 2023; 7:59. [PMID: 37337080 DOI: 10.1038/s41698-023-00413-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 06/02/2023] [Indexed: 06/21/2023] Open
Abstract
Increasing evidence suggests that besides mutational and molecular alterations, the immune component of the tumor microenvironment also substantially impacts tumor behavior and complicates treatment response, particularly to immunotherapies. Although the standard method for characterizing tumor immune profile is through performing integrated genomic analysis on tissue biopsies, the dynamic change in the immune composition of the tumor microenvironment makes this approach not feasible, especially for brain tumors. Radiomics is a rapidly growing field that uses advanced imaging techniques and computational algorithms to extract numerous quantitative features from medical images. Recent advances in machine learning methods are facilitating biological validation of radiomic signatures and allowing them to "mine" for a variety of significant correlates, including genetic, immunologic, and histologic data. Radiomics has the potential to be used as a non-invasive approach to predict the presence and density of immune cells within the microenvironment, as well as to assess the expression of immune-related genes and pathways. This information can be essential for patient stratification, informing treatment decisions and predicting patients' response to immunotherapies. This is particularly important for tumors with difficult surgical access such as gliomas. In this review, we provide an overview of the glioma microenvironment, describe novel approaches for clustering patients based on their tumor immune profile, and discuss the latest progress on utilization of radiomics for immune profiling of glioma based on current literature.
Collapse
Affiliation(s)
- Nastaran Khalili
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Anahita Fathi Kazerooni
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- AI2D Center for AI and Data Science for Integrated Diagnostics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ariana Familiar
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Debanjan Haldar
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Institute of Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Adam Kraya
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jessica Foster
- Division of Oncology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Mateusz Koptyra
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Phillip B Storm
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Adam C Resnick
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ali Nabavizadeh
- Center for Data-Driven Discovery in Biomedicine (D3b), Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
7
|
Vasaikar SV, Savage AK, Gong Q, Swanson E, Talla A, Lord C, Heubeck AT, Reading J, Graybuck LT, Meijer P, Torgerson TR, Skene PJ, Bumol TF, Li XJ. A comprehensive platform for analyzing longitudinal multi-omics data. Nat Commun 2023; 14:1684. [PMID: 36973282 PMCID: PMC10041512 DOI: 10.1038/s41467-023-37432-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/17/2023] [Indexed: 03/29/2023] Open
Abstract
Longitudinal bulk and single-cell omics data is increasingly generated for biological and clinical research but is challenging to analyze due to its many intrinsic types of variations. We present PALMO ( https://github.com/aifimmunology/PALMO ), a platform that contains five analytical modules to examine longitudinal bulk and single-cell multi-omics data from multiple perspectives, including decomposition of sources of variations within the data, collection of stable or variable features across timepoints and participants, identification of up- or down-regulated markers across timepoints of individual participants, and investigation on samples of same participants for possible outlier events. We have tested PALMO performance on a complex longitudinal multi-omics dataset of five data modalities on the same samples and six external datasets of diverse background. Both PALMO and our longitudinal multi-omics dataset can be valuable resources to the scientific community.
Collapse
Affiliation(s)
| | - Adam K Savage
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | - Qiuyu Gong
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | - Elliott Swanson
- Allen Institute for Immunology, Seattle, WA, 98109, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Aarthi Talla
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | - Cara Lord
- Allen Institute for Immunology, Seattle, WA, 98109, USA
- GlaxoSmithKline, Collegeville, PA, 19426, USA
| | | | | | | | - Paul Meijer
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | | | - Peter J Skene
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | | | - Xiao-Jun Li
- Allen Institute for Immunology, Seattle, WA, 98109, USA.
| |
Collapse
|
8
|
Generation and analysis of context-specific genome-scale metabolic models derived from single-cell RNA-Seq data. Proc Natl Acad Sci U S A 2023; 120:e2217868120. [PMID: 36719923 PMCID: PMC9963017 DOI: 10.1073/pnas.2217868120] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Single-cell RNA sequencing combined with genome-scale metabolic models (GEMs) has the potential to unravel the differences in metabolism across both cell types and cell states but requires new computational methods. Here, we present a method for generating cell-type-specific genome-scale models from clusters of single-cell RNA-Seq profiles. Specifically, we developed a method to estimate the minimum number of cells required to pool to obtain stable models, a bootstrapping strategy for estimating statistical inference, and a faster version of the task-driven integrative network inference for tissues algorithm for generating context-specific GEMs. In addition, we evaluated the effect of different RNA-Seq normalization methods on model topology and differences in models generated from single-cell and bulk RNA-Seq data. We applied our methods on data from mouse cortex neurons and cells from the tumor microenvironment of lung cancer and in both cases found that almost every cell subtype had a unique metabolic profile. In addition, our approach was able to detect cancer-associated metabolic differences between cancer cells and healthy cells, showcasing its utility. We also contextualized models from 202 single-cell clusters across 19 human organs using data from Human Protein Atlas and made these available in the web portal Metabolic Atlas, thereby providing a valuable resource to the scientific community. With the ever-increasing availability of single-cell RNA-Seq datasets and continuously improved GEMs, their combination holds promise to become an important approach in the study of human metabolism.
Collapse
|
9
|
Collaboration between Antagonistic Cell Type Regulators Governs Natural Variation in the Candida albicans Biofilm and Hyphal Gene Expression Network. mBio 2022; 13:e0193722. [PMID: 35993746 PMCID: PMC9600859 DOI: 10.1128/mbio.01937-22] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Candida albicans is among the most significant human fungal pathogens. However, the vast majority of C. albicans studies have focused on a single clinical isolate and its marked derivatives. We investigated natural variation among clinical C. albicans isolates in gene regulatory control of biofilm formation, a process crucial to virulence. The transcription factor Efg1 is required for biofilm-associated gene expression and biofilm formation. Previously, we found extensive variation in Efg1-responsive gene expression among 5 diverse clinical isolates. However, chromatin immunoprecipitation sequencing analysis showed that Efg1 binding to genomic loci was uniform among the isolates. Functional dissection of strain differences identified three transcription factors, Brg1, Tec1, and Wor1, for which small changes in expression levels reshaped the Efg1 regulatory network. Brg1 and Tec1 are known biofilm activators, and their role in Efg1 network variation may be expected. However, Wor1 is a known repressor of EFG1 expression and an inhibitor of biofilm formation. In contrast, we found that a modest increase in WOR1 RNA levels, reflecting the expression differences between C. albicans strains, could augment biofilm formation and expression of biofilm-related genes. The analysis of natural variation here reveals a novel function for a well-characterized gene and illustrates that strain diversity offers a unique resource for elucidation of network interactions.
Collapse
|
10
|
Bentzur A, Alon S, Shohat-Ophir G. Behavioral Neuroscience in the Era of Genomics: Tools and Lessons for Analyzing High-Dimensional Datasets. Int J Mol Sci 2022; 23:3811. [PMID: 35409169 PMCID: PMC8998543 DOI: 10.3390/ijms23073811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/26/2022] [Accepted: 03/29/2022] [Indexed: 12/10/2022] Open
Abstract
Behavioral neuroscience underwent a technology-driven revolution with the emergence of machine-vision and machine-learning technologies. These technological advances facilitated the generation of high-resolution, high-throughput capture and analysis of complex behaviors. Therefore, behavioral neuroscience is becoming a data-rich field. While behavioral researchers use advanced computational tools to analyze the resulting datasets, the search for robust and standardized analysis tools is still ongoing. At the same time, the field of genomics exploded with a plethora of technologies which enabled the generation of massive datasets. This growth of genomics data drove the emergence of powerful computational approaches to analyze these data. Here, we discuss the composition of a large behavioral dataset, and the differences and similarities between behavioral and genomics data. We then give examples of genomics-related tools that might be of use for behavioral analysis and discuss concepts that might emerge when considering the two fields together.
Collapse
Affiliation(s)
- Assa Bentzur
- The Mina & Everard Goodman Faculty of Life Sciences, Gonda Multidisciplinary Brain Research Center, Institute of Nanotechnology, Bar-Ilan University, Ramat Gan 5290002, Israel;
- The Alexander Kofkin Faculty of Engineering, Gonda Multidisciplinary Brain Research Center, Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Shahar Alon
- The Alexander Kofkin Faculty of Engineering, Gonda Multidisciplinary Brain Research Center, Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Galit Shohat-Ophir
- The Mina & Everard Goodman Faculty of Life Sciences, Gonda Multidisciplinary Brain Research Center, Institute of Nanotechnology, Bar-Ilan University, Ramat Gan 5290002, Israel;
| |
Collapse
|
11
|
Mortlock S, McKinnon B, Montgomery GW. Genetic Regulation of Transcription in the Endometrium in Health and Disease. FRONTIERS IN REPRODUCTIVE HEALTH 2022; 3:795464. [PMID: 36304015 PMCID: PMC9580733 DOI: 10.3389/frph.2021.795464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/06/2021] [Indexed: 11/25/2023] Open
Abstract
The endometrium is a complex and dynamic tissue essential for fertility and implicated in many reproductive disorders. The tissue consists of glandular epithelium and vascularised stroma and is unique because it is constantly shed and regrown with each menstrual cycle, generating up to 10 mm of new mucosa. Consequently, there are marked changes in cell composition and gene expression across the menstrual cycle. Recent evidence shows expression of many genes is influenced by genetic variation between individuals. We and others have reported evidence for genetic effects on hundreds of genes in endometrium. The genetic factors influencing endometrial gene expression are highly correlated with the genetic effects on expression in other reproductive (e.g., in uterus and ovary) and digestive tissues (e.g., salivary gland and stomach), supporting a shared genetic regulation of gene expression in biologically similar tissues. There is also increasing evidence for cell specific genetic effects for some genes. Sample size for studies in endometrium are modest and results from the larger studies of gene expression in blood report genetic effects for a much higher proportion of genes than currently reported for endometrium. There is also emerging evidence for the importance of genetic variation on RNA splicing. Gene mapping studies for common disease, including diseases associated with endometrium, show most variation maps to intergenic regulatory regions. It is likely that genetic risk factors for disease function through modifying the program of cell specific gene expression. The emerging evidence from our gene mapping studies coupled with tissue specific studies, and the GTEx, eQTLGen and EpiMap projects, show we need to expand our understanding of the complex regulation of gene expression. These data also help to link disease genetic risk factors to specific target genes. Combining our data on genetic regulation of gene expression in endometrium, and cell types within the endometrium with gene mapping data for endometriosis and related diseases is beginning to uncover the specific genes and pathways responsible for increased risk of these diseases.
Collapse
Affiliation(s)
| | | | - Grant W. Montgomery
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
12
|
Le T, Aronow RA, Kirshtein A, Shahriyari L. A review of digital cytometry methods: estimating the relative abundance of cell types in a bulk of cells. Brief Bioinform 2021; 22:bbaa219. [PMID: 33003193 PMCID: PMC8293826 DOI: 10.1093/bib/bbaa219] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/15/2020] [Accepted: 08/17/2020] [Indexed: 01/20/2023] Open
Abstract
Due to the high cost of flow and mass cytometry, there has been a recent surge in the development of computational methods for estimating the relative distributions of cell types from the gene expression profile of a bulk of cells. Here, we review the five common 'digital cytometry' methods: deconvolution of RNA-Seq, cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT), CIBERSORTx, single sample gene set enrichment analysis and single-sample scoring of molecular phenotypes deconvolution method. The results show that CIBERSORTx B-mode, which uses batch correction to adjust the gene expression profile of the bulk of cells ('mixture data') to eliminate possible cross-platform variations between the mixture data and the gene expression data of single cells ('signature matrix'), outperforms other methods, especially when signature matrix and mixture data come from different platforms. However, in our tests, CIBERSORTx S-mode, which uses batch correction for adjusting the signature matrix instead of mixture data, did not perform better than the original CIBERSORT method, which does not use any batch correction method. This result suggests the need for further investigations into how to utilize batch correction in deconvolution methods.
Collapse
Affiliation(s)
- Trang Le
- University of Massachusetts Amherst
| | - Rachel A Aronow
- Department of Mathematics and Statistics at the University of Massachusetts Amherst
| | - Arkadz Kirshtein
- Department of Mathematics and Statistics at the University of Massachusetts Amherst
| | | |
Collapse
|
13
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
14
|
Chen Y, Wu T, Zhu Z, Huang H, Zhang L, Goel A, Yang M, Wang X. An integrated workflow for biomarker development using microRNAs in extracellular vesicles for cancer precision medicine. Semin Cancer Biol 2021; 74:134-155. [PMID: 33766650 DOI: 10.1016/j.semcancer.2021.03.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 03/13/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023]
Abstract
EV-miRNAs are microRNA (miRNA) molecules encapsulated in extracellular vesicles (EVs), which play crucial roles in tumor pathogenesis, progression, and metastasis. Recent studies about EV-miRNAs have gained novel insights into cancer biology and have demonstrated a great potential to develop novel liquid biopsy assays for various applications. Notably, compared to conventional liquid biomarkers, EV-miRNAs are more advantageous in representing host-cell molecular architecture and exhibiting higher stability and specificity. Despite various available techniques for EV-miRNA separation, concentration, profiling, and data analysis, a standardized approach for EV-miRNA biomarker development is yet lacking. In this review, we performed a substantial literature review and distilled an integrated workflow encompassing important steps for EV-miRNA biomarker development, including sample collection and EV isolation, EV-miRNA extraction and quantification, high-throughput data preprocessing, biomarker prioritization and model construction, functional analysis, as well as validation. With the rapid growth of "big data", we highlight the importance of efficient mining of high-throughput data for the discovery of EV-miRNA biomarkers and integrating multiple independent datasets for in silico and experimental validations to increase the robustness and reproducibility. Furthermore, as an efficient strategy in systems biology, network inference provides insights into the regulatory mechanisms and can be used to select functionally important EV-miRNAs to refine the biomarker candidates. Despite the encouraging development in the field, a number of challenges still hinder the clinical translation. We finally summarize several common challenges in various biomarker studies and discuss potential opportunities emerging in the related fields.
Collapse
Affiliation(s)
- Yu Chen
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Tan Wu
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Zhongxu Zhu
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Hao Huang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong
| | - Liang Zhang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong; Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong; Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong Province, China
| | - Ajay Goel
- Department of Molecular Diagnostics and Experimental Therapeutics, Beckman Research Institute of City of Hope Comprehensive Cancer Center, Duarte, CA, USA
| | - Mengsu Yang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong; Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong; Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong Province, China
| | - Xin Wang
- Department of Biomedical Sciences, City University of Hong Kong, 31 To Yuen Street, Kowloon Tong, Hong Kong; Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong; Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong Province, China.
| |
Collapse
|
15
|
Gustafsson J, Robinson J, Inda-Díaz JS, Björnson E, Jörnsten R, Nielsen J. DSAVE: Detection of misclassified cells in single-cell RNA-Seq data. PLoS One 2020; 15:e0243360. [PMID: 33270740 PMCID: PMC7714356 DOI: 10.1371/journal.pone.0243360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 11/19/2020] [Indexed: 11/19/2022] Open
Abstract
Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.
Collapse
Affiliation(s)
- Johan Gustafsson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Wallenberg Center for Protein Research, Chalmers University of Technology, Gothenburg, Sweden
| | - Jonathan Robinson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Wallenberg Center for Protein Research, Chalmers University of Technology, Gothenburg, Sweden
| | - Juan S. Inda-Díaz
- Mathematical Sciences, University of Gothenburg and Chalmers University of Technology, Gothenburg, Sweden
| | - Elias Björnson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Department of Molecular and Clinical Medicine, Wallenberg Laboratory for Cardiovascular and Metabolic Research, University of Gothenburg, Gothenburg, Sweden
| | - Rebecka Jörnsten
- Mathematical Sciences, University of Gothenburg and Chalmers University of Technology, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Wallenberg Center for Protein Research, Chalmers University of Technology, Gothenburg, Sweden
- BioInnovation Institute, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|