1
|
Guo X, Sulaiman M, Neumann A, Zheng SC, Cecil CAM, Teschendorff AE, Heijmans BT. Unified high-resolution immune cell fraction estimation in blood tissue from birth to old age. Genome Med 2025; 17:63. [PMID: 40426256 PMCID: PMC12108007 DOI: 10.1186/s13073-025-01489-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Accepted: 05/16/2025] [Indexed: 05/29/2025] Open
Abstract
Variations in immune-cell fractions can confound or hamper interpretation of DNAm-based biomarkers in blood. Although cell-type deconvolution can address this challenge for cord and adult blood, currently there is no method applicable to blood from other age groups, including infants and children. Here we construct and extensively validate a DNAm reference panel, called UniLIFE, for 19 immune cell-types, applicable to blood tissue of any age. We use UniLIFE to delineate the dynamics of immune-cell fractions from birth to old age, and to infer disease associated immune cell fraction variations in newborns, infants, children and adults. In a prospective longitudinal study of type-1 diabetes in infants and children, UniLIFE identifies differentially methylated positions that precede type-1 diabetes diagnosis and that map to diabetes related signaling pathways. In summary, UniLIFE will improve the identification and interpretation of blood-based DNAm biomarkers for any age group, but specially for longitudinal studies that include infants and children. The UniLIFE panel and algorithms to estimate cell-type fractions are available from our EpiDISH Bioconductor R-package: https://bioconductor.org/packages/release/bioc/html/EpiDISH.html.
Collapse
Affiliation(s)
- Xiaolong Guo
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Mahnoor Sulaiman
- Department of Biomedical Data Sciences, Leiden University Medical Center, Einthovenweg 20, Leiden, 2333 ZC, The Netherlands
- Department of Child and Adolescent Psychiatry/Psychology, Sophia's Children Centre, Erasmus MC, Rotterdam, The Netherlands
| | - Alexander Neumann
- Department of Child and Adolescent Psychiatry/Psychology, Sophia's Children Centre, Erasmus MC, Rotterdam, The Netherlands
| | - Shijie C Zheng
- Pfizer Research & Development, Pfizer Inc, Groton, CT, USA
| | - Charlotte A M Cecil
- Department of Child and Adolescent Psychiatry/Psychology, Sophia's Children Centre, Erasmus MC, Rotterdam, The Netherlands.
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands.
| | - Andrew E Teschendorff
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.
| | - Bastiaan T Heijmans
- Department of Biomedical Data Sciences, Leiden University Medical Center, Einthovenweg 20, Leiden, 2333 ZC, The Netherlands.
| |
Collapse
|
2
|
Benoit-Pilven C, Asteljoki JV, Leinonen JT, Karjalainen J, Daly MJ, Tukiainen T. Early establishment and life course stability of sex biases in the human brain transcriptome. CELL GENOMICS 2025:100890. [PMID: 40425010 DOI: 10.1016/j.xgen.2025.100890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 02/07/2025] [Accepted: 04/30/2025] [Indexed: 05/29/2025]
Abstract
To elaborate on the origins of the established male-female differences in several brain-related phenotypes, we assessed the patterns of transcriptomic sex biases in the developing and adult human forebrain. We find an abundance of sex differences in expression (sex-DEs) in the prenatal brain, driven by both hormonal and sex-chromosomal factors, and considerable consistency in the sex effects between the developing and adult brain, with little sex-DE exclusive to the adult forebrain. Sex-DE was not enriched in genes associated with brain disorders, consistent with systematic differences in the characteristics of these genes (e.g., constraint). Yet, the genes with persistent sex-DE across the lifespan were overrepresented in disease gene co-regulation networks, pointing to their potential to mediate sex biases in brain phenotypes. Altogether, our work highlights prenatal development as a crucial time point for the establishment of brain sex differences.
Collapse
Affiliation(s)
- Clara Benoit-Pilven
- Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juho V Asteljoki
- Minerva Foundation Institute for Medical Research, Helsinki, Finland; Department of Internal Medicine, University of Helsinki, Helsinki, Finland; Abdominal Center, Helsinki University Hospital, Helsinki, Finland
| | - Jaakko T Leinonen
- Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juha Karjalainen
- Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Mark J Daly
- Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Taru Tukiainen
- Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
3
|
Damigos S, Caliskan A, Wajant G, Giddins S, Moldovan A, Kuhn S, Putz E, Dandekar T, Rudel T, Westermann AJ, Zdzieblo D. A Multicellular In Vitro Model of the Human Intestine with Immunocompetent Features Highlights Host-Pathogen Interactions During Early Salmonella Typhimurium Infection. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2411233. [PMID: 39807570 PMCID: PMC11884561 DOI: 10.1002/advs.202411233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 12/24/2024] [Indexed: 01/16/2025]
Abstract
Studying the molecular basis of intestinal infections caused by enteric pathogens at the tissue level is challenging, because most human intestinal infection models have limitations, and results obtained from animals may not reflect the human situation. Infections with Salmonella enterica serovar Typhimurium (STm) have different outcomes between organisms. 3D tissue modeling of primary human material provides alternatives to animal experimentation, but epithelial co-culture with immune cells remains difficult. Macrophages, for instance, contribute to the immunocompetence of native tissue, yet their incorporation into human epithelial tissue models is challenging. A 3D immunocompetent tissue model of the human small intestine based on decellularized submucosa enriched with monocyte-derived macrophages (MDM) is established. The multicellular model recapitulated in vivo-like cellular diversity, especially the induction of GP2 positive microfold (M) cells. Infection studies with STm reveal that the pathogen physically interacts with these M-like cells. MDMs show trans-epithelial migration and phagocytosed STm within the model and the levels of inflammatory cytokines are induced upon STm infection. Infected epithelial cells are shed into the supernatant, potentially reflecting an intracellular reservoir of invasion-primed STm. Together, the 3D model of the human intestinal epithelium bears potential as an alternative to animals to identify human-specific processes underlying enteric bacterial infections.
Collapse
Affiliation(s)
- Spyridon Damigos
- Department for Functional Materials in Medicine and DentistryUniversity Hospital WürzburgWürzburgGermany
| | - Aylin Caliskan
- Department of BioinformaticsBiocenterUniversity of WürzburgWürzburgGermany
| | - Gisela Wajant
- Department for Functional Materials in Medicine and DentistryUniversity Hospital WürzburgWürzburgGermany
| | - Sara Giddins
- Department for Functional Materials in Medicine and DentistryUniversity Hospital WürzburgWürzburgGermany
| | - Adriana Moldovan
- Department of MicrobiologyBiocenterUniversity of WürzburgWürzburgGermany
| | - Sabine Kuhn
- Institute of Clinical Transfusion Medicine and HemotherapyUniversity of WuerzburgWuerzburgGermany
| | - Evelyn Putz
- Institute of Clinical Transfusion Medicine and HemotherapyUniversity of WuerzburgWuerzburgGermany
| | - Thomas Dandekar
- Department of BioinformaticsBiocenterUniversity of WürzburgWürzburgGermany
| | - Thomas Rudel
- Department of MicrobiologyBiocenterUniversity of WürzburgWürzburgGermany
| | - Alexander J. Westermann
- Department of MicrobiologyBiocenterUniversity of WürzburgWürzburgGermany
- Helmholtz‐Institute for RNA‐based Infection Research (HIRI)Helmholtz Centre for Infection Research (HZI)WürzburgGermany
| | - Daniela Zdzieblo
- Department for Functional Materials in Medicine and DentistryUniversity Hospital WürzburgWürzburgGermany
- Translational Center for Regenerative Therapies (TLC‐RT)Fraunhofer Institute for Silicate Research (ISC)97070WürzburgGermany
| |
Collapse
|
4
|
Yu S, Meng G, Tang W, Ma W, Wang R, Zhu X, Sun X, Feng H. cypress: an R/Bioconductor package for cell-type-specific differential expression analysis power assessment. Bioinformatics 2024; 40:btae511. [PMID: 39153205 PMCID: PMC11357793 DOI: 10.1093/bioinformatics/btae511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/24/2024] [Accepted: 08/13/2024] [Indexed: 08/19/2024] Open
Abstract
SUMMARY Recent methodology advances in computational signal deconvolution have enabled bulk transcriptome data analysis at a finer cell-type level. Through deconvolution, identifying cell-type-specific differentially expressed (csDE) genes is drawing increasing attention in clinical applications. However, researchers still face a number of difficulties in adopting csDE genes detection methods in practice, especially in their experimental design. Here we present cypress, the first experimental design and statistical power analysis tool in csDE genes identification. This tool can reliably model purified cell-type-specific (CTS) profiles, cell-type compositions, biological and technical variations, offering a high-fidelity simulator for bulk RNA-seq convolution and deconvolution. cypress conducts simulation and evaluates the impact of multiple influencing factors, by various statistical metrics, to help researchers optimize experimental design and conduct power analysis. AVAILABILITY AND IMPLEMENTATION cypress is an open-source R/Bioconductor package at https://bioconductor.org/packages/cypress/.
Collapse
Affiliation(s)
- Shilin Yu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, United States
| | - Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Wenjing Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Rui Wang
- Department of Surgery, Case Western Reserve University, Cleveland, OH 44106, United States
- Division of Surgical Oncology, Department of Surgery, University Hospitals Cleveland Medical Center, Cleveland, OH 44106, United States
| | - Xiongwei Zhu
- Department of Pathology, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Xiaobo Sun
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| |
Collapse
|
5
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. Genome Med 2024; 16:65. [PMID: 38685057 PMCID: PMC11057104 DOI: 10.1186/s13073-024-01338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
Using computational tools, bulk transcriptomics can be deconvoluted to estimate the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, ignoring person-to-person heterogeneity. Here, we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. Simulation studies demonstrate reduced bias compared with existing methods. Real data analyses on longitudinal consortia show disparities in cell type proportions are associated with several disease phenotypes in Type 1 diabetes and Parkinson's disease. imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/ .
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA.
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA.
| |
Collapse
|
6
|
Pan Y, Wang X, Sun J, Liu C, Peng J, Li Q. Multimodal joint deconvolution and integrative signature selection in proteomics. Commun Biol 2024; 7:493. [PMID: 38658803 PMCID: PMC11043077 DOI: 10.1038/s42003-024-06155-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 04/08/2024] [Indexed: 04/26/2024] Open
Abstract
Deconvolution is an efficient approach for detecting cell-type-specific (cs) transcriptomic signals without cellular segmentation. However, this type of methods may require a reference profile from the same molecular source and tissue type. Here, we present a method to dissect bulk proteome by leveraging tissue-matched transcriptome and proteome without using a proteomics reference panel. Our method also selects the proteins contributing to the cellular heterogeneity shared between bulk transcriptome and proteome. The deconvoluted result enables downstream analyses such as cs-protein Quantitative Trait Loci (cspQTL) mapping. We benchmarked the performance of this multimodal deconvolution approach through CITE-seq pseudo bulk data, a simulation study, and the bulk multi-omics data from human brain normal tissues and breast cancer tumors, individually, showing robust and accurate cell abundance quantification across different datasets. This algorithm is implemented in a tool MICSQTL that also provides cspQTL and multi-omics integrative visualization, available at https://bioconductor.org/packages/MICSQTL .
Collapse
Affiliation(s)
- Yue Pan
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xusheng Wang
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Genetics, Genomics & Informatics, University of Tennessee Health Science Center, Memphis, TN, 38105, USA
| | - Jiao Sun
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Junmin Peng
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
7
|
Lyu Y, Wu C, Sun W, Li Z. Regional analysis to delineate intrasample heterogeneity with RegionalST. Bioinformatics 2024; 40:btae186. [PMID: 38579257 PMCID: PMC11026142 DOI: 10.1093/bioinformatics/btae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 03/06/2024] [Accepted: 04/03/2024] [Indexed: 04/07/2024] Open
Abstract
MOTIVATION Spatial transcriptomics has greatly contributed to our understanding of spatial and intra-sample heterogeneity, which could be crucial for deciphering the molecular basis of human diseases. Intra-tumor heterogeneity, e.g. may be associated with cancer treatment responses. However, the lack of computational tools for exploiting cross-regional information and the limited spatial resolution of current technologies present major obstacles to elucidating tissue heterogeneity. RESULTS To address these challenges, we introduce RegionalST, an efficient computational method that enables users to quantify cell type mixture and interactions, identify sub-regions of interest, and perform cross-region cell type-specific differential analysis for the first time. Our simulations and real data applications demonstrate that RegionalST is an efficient tool for visualizing and analyzing diverse spatial transcriptomics data, thereby enabling accurate and flexible exploration of tissue heterogeneity. Overall, RegionalST provides a one-stop destination for researchers seeking to delve deeper into the intricacies of spatial transcriptomics data. AVAILABILITY AND IMPLEMENTATION The implementation of our method is available as an open-source R/Bioconductor package with a user-friendly manual available at https://bioconductor.org/packages/release/bioc/html/RegionalST.html.
Collapse
Affiliation(s)
- Yue Lyu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Wei Sun
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, United States
- Department of Biostatistics, University of Washington, Seattle, WA 98195, United States
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| |
Collapse
|
8
|
YOUSEF M, ALLMER J. Deep learning in bioinformatics. Turk J Biol 2023; 47:366-382. [PMID: 38681776 PMCID: PMC11045206 DOI: 10.55730/1300-0152.2671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open
Abstract
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.
Collapse
Affiliation(s)
- Malik YOUSEF
- Department of Information Systems, Zefat Academic College, Zefat,
Israel
| | - Jens ALLMER
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr,
Germany
| |
Collapse
|
9
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.27.559579. [PMID: 37808714 PMCID: PMC10557724 DOI: 10.1101/2023.09.27.559579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Real-world clinical samples are often admixtures of signal mosaics from multiple pure cell types. Using computational tools, bulk transcriptomics can be deconvoluted to solve for the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, which ignores person-to-person heterogeneity. Here we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. imply can borrow information across repeatedly measured samples for each subject, and obtain precise cell type proportion estimations. Simulation studies demonstrate reduced bias in cell type abundance estimation compared with existing methods. Real data analyses on large longitudinal consortia show more realistic deconvolution results that align with biological facts. Our results suggest that disparities in cell type proportions are associated with several disease phenotypes in type 1 diabetes and Parkinson's disease. Our proposed tool imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/.
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R. Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| |
Collapse
|
10
|
Feng H, Meng G, Lin T, Parikh H, Pan Y, Li Z, Krischer J, Li Q. ISLET: individual-specific reference panel recovery improves cell-type-specific inference. Genome Biol 2023; 24:174. [PMID: 37496087 PMCID: PMC10373385 DOI: 10.1186/s13059-023-03014-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 07/12/2023] [Indexed: 07/28/2023] Open
Abstract
We propose a statistical framework ISLET to infer individual-specific and cell-type-specific transcriptome reference panels. ISLET models the repeatedly measured bulk gene expression data, to optimize the usage of shared information within each subject. ISLET is the first available method to achieve individual-specific reference estimation in repeated samples. Using simulation studies, we show outstanding performance of ISLET in the reference estimation and downstream cell-type-specific differentially expressed genes testing. We apply ISLET to longitudinal transcriptomes profiled from blood samples in a large observational study of young children and confirm the cell-type-specific gene signatures for pancreatic islet autoantibody. ISLET is available at https://bioconductor.org/packages/ISLET .
Collapse
Affiliation(s)
- Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Tong Lin
- Department of Biostatistics, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA
| | - Hemang Parikh
- Health Informatics Institute, University of South Florida, Tampa, FL, 33620, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, FL, 33620, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| |
Collapse
|
11
|
Huang P, Cai M, Lu X, McKennan C, Wang J. Accurate estimation of rare cell type fractions from tissue omics data via hierarchical deconvolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.15.532820. [PMID: 36993280 PMCID: PMC10055056 DOI: 10.1101/2023.03.15.532820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type-specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, in silico cellular deconvolution methods have been developed as an alternative. However, existing methods are designed for tissues consisting of clearly distinguishable cell types and have difficulties estimating highly correlated or rare cell types. To address this challenge, we propose Hierarchical Deconvolution (HiDecon) that uses single-cell RNA sequencing references and a hierarchical cell type tree, which models the similarities among cell types and cell differentiation relationships, to estimate cellular fractions in bulk data. By coordinating cell fractions across layers of the hierarchical tree, cellular fraction information is passed up and down the tree, which helps correct estimation biases by pooling information across related cell types. The flexible hierarchical tree structure also enables estimating rare cell fractions by splitting the tree to higher resolutions. Through simulations and real data applications with the ground truth of measured cellular fractions, we demonstrate that HiDecon significantly outperforms existing methods and accurately estimates cellular fractions.
Collapse
Affiliation(s)
- Penghui Huang
- Deparment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Manqi Cai
- Deparment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Chris McKennan
- Deparment of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jiebiao Wang
- Deparment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
12
|
Chen L, Li Z, Wu H. CeDAR: incorporating cell type hierarchy improves cell type-specific differential analyses in bulk omics data. Genome Biol 2023; 24:37. [PMID: 36855165 PMCID: PMC9972684 DOI: 10.1186/s13059-023-02857-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 01/17/2023] [Indexed: 03/02/2023] Open
Abstract
Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate cell type-specific inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we develop a novel statistical method named CeDAR to incorporate the cell type hierarchy in cell type-specific differential analyses of bulk data. Extensive simulation and real data analyses demonstrate that this approach significantly improves the accuracy and power in detecting cell type-specific differential signals compared with existing methods, especially in low-abundance cell types.
Collapse
Affiliation(s)
- Luxiao Chen
- Department of Biostatistics and Bioinformatics, Emory University, GA 30322 Atlanta, USA
| | - Ziyi Li
- Department of Biostatistics, The University of MD Anderson Cancer Center, 77030 Houston, TX, USA
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055 P.R. China
| |
Collapse
|
13
|
Haftorn KL, Denault WRP, Lee Y, Page CM, Romanowska J, Lyle R, Næss ØE, Kristjansson D, Magnus PM, Håberg SE, Bohlin J, Jugessur A. Nucleated red blood cells explain most of the association between DNA methylation and gestational age. Commun Biol 2023; 6:224. [PMID: 36849614 PMCID: PMC9971030 DOI: 10.1038/s42003-023-04584-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 02/13/2023] [Indexed: 03/01/2023] Open
Abstract
Determining if specific cell type(s) are responsible for an association between DNA methylation (DNAm) and a given phenotype is important for understanding the biological mechanisms underlying the association. Our EWAS of gestational age (GA) in 953 newborns from the Norwegian MoBa study identified 13,660 CpGs significantly associated with GA (pBonferroni<0.05) after adjustment for cell type composition. When the CellDMC algorithm was applied to explore cell-type specific effects, 2,330 CpGs were significantly associated with GA, mostly in nucleated red blood cells [nRBCs; n = 2,030 (87%)]. Similar patterns were found in another dataset based on a different array and when applying an alternative algorithm to CellDMC called Tensor Composition Analysis (TCA). Our findings point to nRBCs as the main cell type driving the DNAm-GA association, implicating an epigenetic signature of erythropoiesis as a likely mechanism. They also explain the poor correlation observed between epigenetic age clocks for newborns and those for adults.
Collapse
Affiliation(s)
- Kristine L Haftorn
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.
- Institute of Health and Society, University of Oslo, Oslo, Norway.
| | - William R P Denault
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Yunsung Lee
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Christian M Page
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Physical Health and Ageing, Division of Mental and Physical Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Julia Romanowska
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Global Public Health and Primary Care, , University of Bergen, Bergen, Norway
| | - Robert Lyle
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Øyvind E Næss
- Institute of Health and Society, University of Oslo, Oslo, Norway
- Division of Mental and Physical Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Dana Kristjansson
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway
| | - Per M Magnus
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Siri E Håberg
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Jon Bohlin
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Division for Infection Control and Environmental Health, Department of Infectious Disease Epidemiology and Modelling, Norwegian Institute of Public Health, Oslo, Norway
| | - Astanand Jugessur
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Global Public Health and Primary Care, , University of Bergen, Bergen, Norway
| |
Collapse
|
14
|
Verma A, Kommaddi RP, Gnanabharathi B, Hirsch EC, Ravindranath V. Genes critical for development and differentiation of dopaminergic neurons are downregulated in Parkinson's disease. J Neural Transm (Vienna) 2023; 130:495-512. [PMID: 36820885 DOI: 10.1007/s00702-023-02604-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/13/2023] [Indexed: 02/24/2023]
Abstract
We performed transcriptome analysis using RNA sequencing on substantia nigra pars compacta (SNpc) from mice after acute and chronic 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) treatment and from Parkinson's disease (PD) patients. Acute and chronic exposure to MPTP resulted in decreased expression of genes involved in sodium channel regulation. However, upregulation of pro-inflammatory pathways was seen after single dose but not after chronic MPTP treatment. Dopamine biosynthesis and synaptic vesicle recycling pathways were downregulated in PD patients and after chronic MPTP treatment in mice. Genes essential for midbrain development and determination of dopaminergic phenotype such as, LMX1B, FOXA1, RSPO2, KLHL1, EBF3, PITX3, RGS4, ALDH1A1, RET, FOXA2, EN1, DLK1, GFRA1, LMX1A, NR4A2, GAP43, SNCA, PBX1, and GRB10 were downregulated in human PD and overexpression of GFP tagged LMX1B rescued MPP+ induced death in SH-SY5Y neurons. Downregulation of gene ensemble involved in development and differentiation of dopaminergic neurons indicate their potential involvement in pathogenesis and progression of human PD.
Collapse
Affiliation(s)
- Aditi Verma
- Centre for Neuroscience, Indian Institute of Science, C.V. Raman Avenue, Bangalore, 560012, India
| | - Reddy Peera Kommaddi
- Centre for Brain Research, Indian Institute of Science, Bangalore, 560012, India
| | | | - Etienne C Hirsch
- Sorbonne Université, Institut du Cerveau - ICM, Inserm U 1127, CNRS UMR 7225, 75013, Paris, France
| | - Vijayalakshmi Ravindranath
- Centre for Neuroscience, Indian Institute of Science, C.V. Raman Avenue, Bangalore, 560012, India. .,Centre for Brain Research, Indian Institute of Science, Bangalore, 560012, India.
| |
Collapse
|
15
|
Meng G, Tang W, Huang E, Li Z, Feng H. A comprehensive assessment of cell type-specific differential expression methods in bulk data. Brief Bioinform 2023; 24:bbac516. [PMID: 36472568 PMCID: PMC9851321 DOI: 10.1093/bib/bbac516] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/08/2022] [Accepted: 10/29/2022] [Indexed: 12/12/2022] Open
Abstract
Accounting for cell type compositions has been very successful at analyzing high-throughput data from heterogeneous tissues. Differential gene expression analysis at cell type level is becoming increasingly popular, yielding biomarker discovery in a finer granularity within a particular cell type. Although several computational methods have been developed to identify cell type-specific differentially expressed genes (csDEG) from RNA-seq data, a systematic evaluation is yet to be performed. Here, we thoroughly benchmark six recently published methods: CellDMC, CARseq, TOAST, LRCDE, CeDAR and TCA, together with two classical methods, csSAM and DESeq2, for a comprehensive comparison. We aim to systematically evaluate the performance of popular csDEG detection methods and provide guidance to researchers. In simulation studies, we benchmark available methods under various scenarios of baseline expression levels, sample sizes, cell type compositions, expression level alterations, technical noises and biological dispersions. Real data analyses of three large datasets on inflammatory bowel disease, lung cancer and autism provide evaluation in both the gene level and the pathway level. We find that csDEG calling is strongly affected by effect size, baseline expression level and cell type compositions. Results imply that csDEG discovery is a challenging task itself, with room to improvements on handling low signal-to-noise ratio and low expression genes.
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, Ohio, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, Ohio, USA
| | - Emina Huang
- Department of Surgery, The University of Texas Southwestern Medical Center, Dallas, 75390, Texas, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 77030, Texas, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, Ohio, USA
| |
Collapse
|
16
|
Abstract
DNA methylation data generated from bulk tissue represents a mixture of many different cell types. Variation in the cell-type composition of tissues is thus a major confounder when inferring differential DNA methylation. Due to the high cost of single-cell methylome sequencing, computational methods that can dissect the cell-type heterogeneity of bulk DNA methylomes offer an efficient and cost-effective solution, especially in the context of large-scale EWAS. In this chapter, we present a step-by-step tutorial of Epigenetic cell-type deconvolution using Single-Cell Omic References (EpiSCORE), a reference-based method that leverages the high-resolution nature of single-cell RNA-Seq datasets to facilitate microdissection of bulk-tissue DNA methylomes.
Collapse
Affiliation(s)
- Tianyu Zhu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
- UCL Cancer Institute, Paul O'Gorman Building, University College London, London, UK.
| |
Collapse
|
17
|
Fan J, Lyu Y, Zhang Q, Wang X, Li M, Xiao R. MuSiC2: cell-type deconvolution for multi-condition bulk RNA-seq data. Brief Bioinform 2022; 23:bbac430. [PMID: 36208175 PMCID: PMC9677503 DOI: 10.1093/bib/bbac430] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 08/19/2022] [Accepted: 09/03/2022] [Indexed: 12/14/2022] Open
Abstract
Cell-type composition of intact bulk tissues can vary across samples. Deciphering cell-type composition and its changes during disease progression is an important step toward understanding disease pathogenesis. To infer cell-type composition, existing cell-type deconvolution methods for bulk RNA sequencing (RNA-seq) data often require matched single-cell RNA-seq (scRNA-seq) data, generated from samples with similar clinical conditions, as reference. However, due to the difficulty of obtaining scRNA-seq data in diseased samples, only limited scRNA-seq data in matched disease conditions are available. Using scRNA-seq reference to deconvolve bulk RNA-seq data from samples with different disease conditions may lead to a biased estimation of cell-type proportions. To overcome this limitation, we propose an iterative estimation procedure, MuSiC2, which is an extension of MuSiC, to perform deconvolution analysis of bulk RNA-seq data generated from samples with multiple clinical conditions where at least one condition is different from that of the scRNA-seq reference. Extensive benchmark evaluations indicated that MuSiC2 improved the accuracy of cell-type proportion estimates of bulk RNA-seq samples under different conditions as compared with the traditional MuSiC deconvolution. MuSiC2 was applied to two bulk RNA-seq datasets for deconvolution analysis, including one from human pancreatic islets and the other from human retina. We show that MuSiC2 improves current deconvolution methods and provides more accurate cell-type proportion estimates when the bulk and single-cell reference differ in clinical conditions. We believe the condition-specific cell-type composition estimates from MuSiC2 will facilitate the downstream analysis and help identify cellular targets of human diseases.
Collapse
Affiliation(s)
- Jiaxin Fan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Yafei Lyu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Qihuang Zhang
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, H3A 1G1, Canada
| | - Xuran Wang
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
18
|
Guo Z, Shafik AM, Jin P, Wu H. Differential RNA methylation analysis for MeRIP-seq data under general experimental design. Bioinformatics 2022; 38:4705-4712. [PMID: 36063045 PMCID: PMC9563684 DOI: 10.1093/bioinformatics/btac601] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 08/03/2022] [Accepted: 09/02/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION RNA epigenetics is an emerging field to study the post-transcriptional gene regulation. The dynamics of RNA epigenetic modification have been reported to associate with many human diseases. Recently developed high-throughput technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables the transcriptome-wide profiling of N6-methyladenosine (m6A) modification and comparison of RNA epigenetic modifications. There are a few computational methods for the comparison of mRNA modifications under different conditions but they all suffer from serious limitations. RESULTS In this work, we develop a novel statistical method to detect differentially methylated mRNA regions from MeRIP-seq data. We model the sequence count data by a hierarchical negative binomial model that accounts for various sources of variations and derive parameter estimation and statistical testing procedures for flexible statistical inferences under general experimental designs. Extensive benchmark evaluations in simulation and real data analyses demonstrate that our method is more accurate, robust and flexible compared to existing methods. AVAILABILITY AND IMPLEMENTATION Our method TRESS is implemented as an R/Bioconductor package and is available at https://bioconductor.org/packages/devel/TRESS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Andrew M Shafik
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
19
|
Tang D, Park S, Zhao H. SCADIE: simultaneous estimation of cell type proportions and cell type-specific gene expressions using SCAD-based iterative estimating procedure. Genome Biol 2022; 23:129. [PMID: 35706040 PMCID: PMC9199219 DOI: 10.1186/s13059-022-02688-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 05/11/2022] [Indexed: 12/13/2022] Open
Abstract
A challenge in bulk gene differential expression analysis is to differentiate changes due to cell type-specific gene expression and cell type proportions. SCADIE is an iterative algorithm that simultaneously estimates cell type-specific gene expression profiles and cell type proportions, and performs cell type-specific differential expression analysis at the group level. Through its unique penalty and objective function, SCADIE more accurately identifies cell type-specific differentially expressed genes than existing methods, including those that may be missed from single cell RNA-Seq data. SCADIE has robust performance with respect to the choice of deconvolution methods and the sources and quality of input data.
Collapse
Affiliation(s)
- Daiwei Tang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, USA
| | - Seyoung Park
- Department of Statistics, Sungkyunkwan University, 25-2, Sungkyunkwan-ro, Jongno-gu, Seoul, South Korea
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, USA
| |
Collapse
|
20
|
Qi L, Teschendorff AE. Cell-type heterogeneity: Why we should adjust for it in epigenome and biomarker studies. Clin Epigenetics 2022; 14:31. [PMID: 35227298 PMCID: PMC8887190 DOI: 10.1186/s13148-022-01253-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 02/21/2022] [Indexed: 12/18/2022] Open
Abstract
Most studies aiming to identify epigenetic biomarkers do so from complex tissues that are composed of many different cell-types. By definition, these cell-types vary substantially in terms of their epigenetic profiles. This cell-type specific variation among healthy cells is completely independent of the variation associated with disease, yet it dominates the epigenetic variability landscape. While cell-type composition of tissues can change in disease and this may provide accurate and reproducible biomarkers, not adjusting for the underlying cell-type heterogeneity may seriously limit the sensitivity and precision to detect disease-relevant biomarkers or hamper our understanding of such biomarkers. Given that computational and experimental tools for tackling cell-type heterogeneity are available, we here stress that future epigenetic biomarker studies should aim to provide estimates of underlying cell-type fractions for all samples in the study, and to identify biomarkers before and after adjustment for cell-type heterogeneity, in order to obtain a more complete and unbiased picture of the biomarker-landscape. This is critical, not only to improve reproducibility and for the eventual clinical application of such biomarkers, but importantly, to also improve our molecular understanding of disease itself.
Collapse
Affiliation(s)
- Luo Qi
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China. .,UCL Cancer Institute, University College London, London, WC1E 8BT, UK.
| |
Collapse
|
21
|
Rahmani E, Jew B, Halperin E. The Effect of Model Directionality on Cell-Type-Specific Differential DNA Methylation Analysis. FRONTIERS IN BIOINFORMATICS 2022; 1:792605. [PMID: 36303752 PMCID: PMC9580934 DOI: 10.3389/fbinf.2021.792605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 12/21/2021] [Indexed: 11/29/2022] Open
Abstract
Calling differential methylation at a cell-type level from tissue-level bulk data is a fundamental challenge in genomics that has recently received more attention. These studies most often aim at identifying statistical associations rather than causal effects. However, existing methods typically make an implicit assumption about the direction of effects, and thus far, little to no attention has been given to the fact that this directionality assumption may not hold and can consequently affect statistical power and control for false positives. We demonstrate that misspecification of the model directionality can lead to a drastic decrease in performance and increase in risk of spurious findings in cell-type-specific differential methylation analysis, and we discuss the need to carefully consider model directionality before choosing a statistical method for analysis.
Collapse
Affiliation(s)
- Elior Rahmani
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, United States
| | - Brandon Jew
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, United States
| | - Eran Halperin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, United States
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, United States
- *Correspondence: Eran Halperin,
| |
Collapse
|
22
|
Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021; 23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Affiliation(s)
- Maria K Jaakkola
- Department of Mathematics and Statistics, University of Turku, Yliopistonmäki, 20014, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520, Turku, Finland
| |
Collapse
|
23
|
Guo Z, Shafik AM, Jin P, Wu Z, Wu H. Detecting m6A methylation regions from Methylated RNA Immunoprecipitation Sequencing. Bioinformatics 2021; 37:2818-2824. [PMID: 33724304 PMCID: PMC9991887 DOI: 10.1093/bioinformatics/btab181] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 02/16/2021] [Accepted: 03/12/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION The post-transcriptional epigenetic modification on mRNA is an emerging field to study the gene regulatory mechanism and their association with diseases. Recently developed high-throughput sequencing technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables one to profile mRNA epigenetic modification transcriptome wide. A few computational methods are available to identify transcriptome-wide mRNA modification, but they are either limited by over-simplified model ignoring the biological variance across replicates or suffer from low accuracy and efficiency. RESULTS In this work, we develop a novel statistical method, based on an empirical Bayesian hierarchical model, to identify mRNA epigenetic modification regions from MeRIP-seq data. Our method accounts for various sources of variations in the data through rigorous modeling and applies shrinkage estimation by borrowing information from transcriptome-wide data to stabilize the parameter estimation. Simulation and real data analyses demonstrate that our method is more accurate, robust and efficient than the existing peak calling methods. AVAILABILITY AND IMPLEMENTATION Our method TRES is implemented as an R package and is freely available on Github at https://github.com/ZhenxingGuo0015/TRES. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Andrew M Shafik
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI 02806, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
24
|
Shelly KE, Candelaria NR, Li Z, Allen EG, Jin P, Nelson DL. Ectopic expression of CGG-repeats alters ovarian response to gonadotropins and leads to infertility in a murine FMR1 premutation model. Hum Mol Genet 2021; 30:923-938. [PMID: 33856019 PMCID: PMC8165648 DOI: 10.1093/hmg/ddab083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 02/18/2021] [Accepted: 03/30/2021] [Indexed: 01/03/2023] Open
Abstract
Women heterozygous for an expansion of CGG repeats in the 5'UTR of FMR1 risk developing fragile X-associated primary ovarian insufficiency (FXPOI) and/or tremor and ataxia syndrome (FXTAS). We show that expanded CGGs, independent of FMR1, are sufficient to drive ovarian insufficiency and that expression of CGG-containing mRNAs alone or in conjunction with a polyglycine-containing peptide translated from these RNAs contribute to dysfunction. Heterozygous females from two mouse lines expressing either CGG RNA-only (RNA-only) or CGG RNA and the polyglycine product FMRpolyG (FMRpolyG+RNA) were used to assess ovarian function in aging animals. The expression of FMRpolyG+RNA led to early cessation of breeding, ovulation and transcriptomic changes affecting cholesterol and steroid hormone biosynthesis. Females expressing CGG RNA-only did not exhibit decreased progeny during natural breeding, but their ovarian transcriptomes were enriched for alterations in cholesterol and lipid biosynthesis. The enrichment of CGG RNA-only ovaries for differentially expressed genes related to cholesterol processing provided a link to the ovarian cysts observed in both CGG-expressing lines. Early changes in transcriptome profiles led us to measure ovarian function in prepubertal females that revealed deficiencies in ovulatory responses to gonadotropins. These include impairments in cumulus expansion and resumption of oocyte meiosis, as well as reduced ovulated oocyte number. Cumulatively, we demonstrated the sufficiency of ectopically expressed CGG repeats to lead to ovarian insufficiency and that co-expression of CGG-RNA and FMRpolyG lead to premature cessation of breeding. However, the expression of CGG RNA-alone was sufficient to lead to ovarian dysfunction by impairing responses to hormonal stimulation.
Collapse
Affiliation(s)
- Katharine E Shelly
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nicholes R Candelaria
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Emily G Allen
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
25
|
Bhattacharya A, Hamilton AM, Troester MA, Love MI. DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing. Nucleic Acids Res 2021; 49:e48. [PMID: 33524140 PMCID: PMC8096278 DOI: 10.1093/nar/gkab031] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 12/21/2020] [Accepted: 01/12/2021] [Indexed: 12/13/2022] Open
Abstract
Targeted mRNA expression panels, measuring up to 800 genes, are used in academic and clinical settings due to low cost and high sensitivity for archived samples. Most samples assayed on targeted panels originate from bulk tissue comprised of many cell types, and cell-type heterogeneity confounds biological signals. Reference-free methods are used when cell-type-specific expression references are unavailable, but limited feature spaces render implementation challenging in targeted panels. Here, we present DeCompress, a semi-reference-free deconvolution method for targeted panels. DeCompress leverages a reference RNA-seq or microarray dataset from similar tissue to expand the feature space of targeted panels using compressed sensing. Ensemble reference-free deconvolution is performed on this artificially expanded dataset to estimate cell-type proportions and gene signatures. In simulated mixtures, four public cell line mixtures, and a targeted panel (1199 samples; 406 genes) from the Carolina Breast Cancer Study, DeCompress recapitulates cell-type proportions with less error than reference-free methods and finds biologically relevant compartments. We integrate compartment estimates into cis-eQTL mapping in breast cancer, identifying a tumor-specific cis-eQTL for CCR3 (C-C Motif Chemokine Receptor 3) at a risk locus. DeCompress improves upon reference-free methods without requiring expression profiles from pure cell populations, with applications in genomic analyses and clinical settings.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA 90095, USA
| | - Alina M Hamilton
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
| | - Melissa A Troester
- Department of Pathology and Laboratory Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
- Department of Epidemiology, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA
| |
Collapse
|
26
|
Jin C, Chen M, Lin D, Sun W. Cell type-aware analysis of RNA-seq data. NATURE COMPUTATIONAL SCIENCE 2021; 1:253-261. [PMID: 34957416 PMCID: PMC8697413 DOI: 10.1038/s43588-021-00055-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/10/2021] [Indexed: 12/13/2022]
Abstract
Most tissue samples are composed of different cell types. Differential expression analysis without accounting for cell type composition cannot separate the changes due to cell type composition or cell type-specific expression. We propose a computational framework to address these limitations: Cell Type Aware analysis of RNA-seq (CARseq). CARseq employs a negative binomial distribution that appropriately models the count data from RNA-seq experiments. Simulation studies show that CARseq has substantially higher power than a linear model-based approach and it also provides more accurate estimate of the rankings of differentially expressed genes. We have applied CARseq to compare gene expression of schizophrenia/autism subjects versus controls, and identified the cell types underlying the difference and similarities of these two neuron-developmental diseases. Our results are consistent with the results from differential expression analysis using single cell RNA-seq data.
Collapse
Affiliation(s)
- Chong Jin
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | | | - Danyu Lin
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Wei Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill
- Public Health Science Division, Fred Hutchinson Cancer Research Center
- Department of Biostatistics, University of Washington
| |
Collapse
|
27
|
He L, Liu L, Li T, Zhuang D, Dai J, Wang B, Bi L. Exploring the Imbalance of Periodontitis Immune System From the Cellular to Molecular Level. Front Genet 2021; 12:653209. [PMID: 33841510 PMCID: PMC8033214 DOI: 10.3389/fgene.2021.653209] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 03/08/2021] [Indexed: 01/22/2023] Open
Abstract
Periodontitis is a common chronic inflammatory disease of periodontal tissue, mostly concentrated in people over 30 years old. Statistics show that compared with foreign countries, the prevalence of periodontitis in China is as high as 40%, and the prevalence of periodontal disease is more than 90%, which must arouse our great attention. Diagnosis and treatment of periodontitis currently rely mainly on clinical criteria, and the exploration of the etiologic criteria is relatively lacking. We, therefore, have explored the pathogenesis of periodontitis from the perspective of immune imbalance. By predicting the fraction of 22 immune cells in periodontitis tissues and comparing them with normal tissues, we found that multiple immune cell infiltration in periodontitis tissues was inhibited and this feature can clearly distinguish periodontitis from normal tissues. Further, protein interaction network (PPI) and transcription regulation network have been constructed based on differentially expressed genes (DEGs) to explore the interaction function modules and regulation pathways. Three functional modules have been revealed and top TFs such as EGR1 and ETS1 have been shown to regulate the expression of periodontitis-related immune genes that play an important role in the formation of the immunosuppressive microenvironment. The classifier was also used to verify the reliability of periodontitis features obtained at the cellular and molecular levels. In conclusion, we have revealed the immune microenvironment and molecular characteristics of periodontitis, which will help to better understand the mechanism of periodontitis and its application in clinical diagnosis and treatment.
Collapse
Affiliation(s)
- Longfei He
- Department of Stomatology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China.,Department of Stomatology, Weifang People's Hospital, Weifang, China
| | - Lijuan Liu
- Department of Stomatology, Weifang People's Hospital, Weifang, China
| | - Ti Li
- Department of Stomatology, Weifang People's Hospital, Weifang, China
| | - Deshu Zhuang
- Department of Stomatology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China.,Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC, Canada
| | - Jiayin Dai
- Department of Stomatology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Bo Wang
- Department of Stomatology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liangjia Bi
- Department of Stomatology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
28
|
Takeuchi F, Kato N. Nonlinear ridge regression improves cell-type-specific differential expression analysis. BMC Bioinformatics 2021; 22:141. [PMID: 33752591 PMCID: PMC7986289 DOI: 10.1186/s12859-021-03982-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 01/27/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity. RESULTS First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data. CONCLUSION Nonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas.
Collapse
Affiliation(s)
- Fumihiko Takeuchi
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan.
| | - Norihiro Kato
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan
| |
Collapse
|
29
|
Li Z, Guo Z, Cheng Y, Jin P, Wu H. Robust partial reference-free cell composition estimation from tissue expression. Bioinformatics 2020; 36:3431-3438. [PMID: 32167531 DOI: 10.1093/bioinformatics/btaa184] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/05/2020] [Accepted: 03/10/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. RESULTS We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. AVAILABILITY AND IMPLEMENTATION The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. CONTACT ziyi.li@emory.edu or hao.wu@emory.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Ying Cheng
- Institute of Biomedical Research, Yunnan University, Kunming, China
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
30
|
You C, Wu S, Zheng SC, Zhu T, Jing H, Flagg K, Wang G, Jin L, Wang S, Teschendorff AE. A cell-type deconvolution meta-analysis of whole blood EWAS reveals lineage-specific smoking-associated DNA methylation changes. Nat Commun 2020; 11:4779. [PMID: 32963246 PMCID: PMC7508850 DOI: 10.1038/s41467-020-18618-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 08/31/2020] [Indexed: 02/06/2023] Open
Abstract
Highly reproducible smoking-associated DNA methylation changes in whole blood have been reported by many Epigenome-Wide-Association Studies (EWAS). These epigenetic alterations could have important implications for understanding and predicting the risk of smoking-related diseases. To this end, it is important to establish if these DNA methylation changes happen in all blood cell subtypes or if they are cell-type specific. Here, we apply a cell-type deconvolution algorithm to identify cell-type specific DNA methylation signals in seven large EWAS. We find that most of the highly reproducible smoking-associated hypomethylation signatures are more prominent in the myeloid lineage. A meta-analysis further identifies a myeloid-specific smoking-associated hypermethylation signature enriched for DNase Hypersensitive Sites in acute myeloid leukemia. These results may guide the design of future smoking EWAS and have important implications for our understanding of how smoking affects immune-cell subtypes and how this may influence the risk of smoking related diseases. Smoking-associated DNA methylation changes in whole blood have been reported by many EWAS. Here, the authors use a cell-type deconvolution algorithm to identify cell-type specific DNA methylation signals in seven EWAS, identifying lineage-specific smoking-associated DNA methylation changes.
Collapse
Affiliation(s)
- Chenglong You
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Sijie Wu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.,Human Phenome Institute, Fudan University, 825 Zhangheng Road, Shanghai, China.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Shijie C Zheng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.,Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Tianyu Zhu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Han Jing
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Ken Flagg
- Guangzhou Regenerative Medicine Guangdong Laboratory, Guangzhou, China
| | - Guangyu Wang
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Li Jin
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.,Human Phenome Institute, Fudan University, 825 Zhangheng Road, Shanghai, China.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Sijia Wang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China. .,UCL Cancer Institute, Paul O'Gorman Building, University College London, 72 Huntley Street, London, WC1E 6BT, UK.
| |
Collapse
|
31
|
Zhang W, Li Z, Wei N, Wu HJ, Zheng X. Detection of differentially methylated CpG sites between tumor samples with uneven tumor purities. Bioinformatics 2020; 36:2017-2024. [PMID: 31769783 DOI: 10.1093/bioinformatics/btz885] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 11/14/2019] [Accepted: 11/23/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. RESULTS We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. AVAILABILITY AND IMPLEMENTATION InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weiwei Zhang
- Department of Mathematics, School of Science, East China University of Technology, Nanchang, Jiangxi 330013, China
| | - Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Nana Wei
- Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| | - Hua-Jun Wu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA 02215, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| |
Collapse
|
32
|
Kim GS, Smith AK, Xue F, Michopoulos V, Lori A, Armstrong DL, Aiello AE, Koenen KC, Galea S, Wildman DE, Uddin M. Methylomic profiles reveal sex-specific differences in leukocyte composition associated with post-traumatic stress disorder. Brain Behav Immun 2019; 81:280-291. [PMID: 31228611 PMCID: PMC6754791 DOI: 10.1016/j.bbi.2019.06.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 06/18/2019] [Accepted: 06/18/2019] [Indexed: 02/07/2023] Open
Abstract
Post-traumatic stress disorder (PTSD) is a debilitating mental disorder precipitated by trauma exposure. However, only some persons exposed to trauma develop PTSD. There are sex differences in risk; twice as many women as men develop a lifetime diagnosis of PTSD. Methylomic profiles derived from peripheral blood are well-suited for investigating PTSD because DNA methylation (DNAm) encodes individual response to trauma and may play a key role in the immune dysregulation characteristic of PTSD pathophysiology. In the current study, we leveraged recent methodological advances to investigate sex-specific differences in DNAm-based leukocyte composition that are associated with lifetime PTSD. We estimated leukocyte composition on a combined methylation array dataset (483 participants, ∼450 k CpG sites) consisting of two civilian cohorts, the Detroit Neighborhood Health Study and Grady Trauma Project. Sex-stratified Mann-Whitney U test and two-way ANCOVA revealed that lifetime PTSD was associated with significantly higher monocyte proportions in males, but not in females (Holm-adjusted p-val < 0.05). No difference in monocyte proportions was observed between current and remitted PTSD cases in males, suggesting that this sex-specific difference may reflect a long-standing trait of lifetime history of PTSD, rather than current state of PTSD. Associations with lifetime PTSD or PTSD status were not observed in any other leukocyte subtype and our finding in monocytes was confirmed using cell estimates based on a different deconvolution algorithm, suggesting that our sex-specific findings are robust across cell estimation approaches. Overall, our main finding of elevated monocyte proportions in males, but not in females with lifetime history of PTSD provides evidence for a sex-specific difference in peripheral blood leukocyte composition that is detectable in methylomic profiles and that may reflect long-standing changes associated with PTSD diagnosis.
Collapse
Affiliation(s)
- Grace S Kim
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL, USA; Medical Scholars Program, University of Illinois College of Medicine, Urbana, IL, USA
| | - Alicia K Smith
- Department of Psychiatry & Behavioral Sciences, Emory University, Atlanta, GA, USA; Department of Gynecology and Obstetrics, Emory University, Atlanta, GA, USA
| | - Fei Xue
- Department of Statistics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Vasiliki Michopoulos
- Department of Psychiatry & Behavioral Sciences, Emory University, Atlanta, GA, USA
| | - Adriana Lori
- Department of Psychiatry & Behavioral Sciences, Emory University, Atlanta, GA, USA
| | - Don L Armstrong
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Allison E Aiello
- Gillings School of Global Public Health, University of North Carolina - Chapel Hill, Chapel Hill, NC, USA
| | - Karestan C Koenen
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sandro Galea
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Derek E Wildman
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Monica Uddin
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA.
| |
Collapse
|
33
|
Li Z, Wu H. TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol 2019; 20:190. [PMID: 31484546 PMCID: PMC6727351 DOI: 10.1186/s13059-019-1778-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
In the analysis of high-throughput data from complex samples, cell composition is an important factor that needs to be accounted for. Except for a limited number of tissues with known pure cell type profiles, a majority of genomics and epigenetics data relies on the "reference-free deconvolution" methods to estimate cell composition. We develop a novel computational method to improve reference-free deconvolution, which iteratively searches for cell type-specific features and performs composition estimation. Simulation studies and applications to six real datasets including both DNA methylation and gene expression data demonstrate favorable performance of the proposed method. TOAST is available at https://bioconductor.org/packages/TOAST .
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA.
| |
Collapse
|