1
|
Balachandran S, Prada-Medina CA, Mensah MA, Kakar N, Nagel I, Pozojevic J, Audain E, Hitz MP, Kircher M, Sreenivasan VKA, Spielmann M. STIGMA: Single-cell tissue-specific gene prioritization using machine learning. Am J Hum Genet 2024; 111:338-349. [PMID: 38228144 PMCID: PMC10870135 DOI: 10.1016/j.ajhg.2023.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/18/2024] Open
Abstract
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
Collapse
Affiliation(s)
- Saranya Balachandran
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Cesar A Prada-Medina
- Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Martin A Mensah
- Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; BIH Charité Digital Clinician Scientist Program, BIH Biomedical Innovation Academy, Anna-Louisa-Karsch-Strasse 2, 10178 Berlin, Germany; RG Development & Disease, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Naseebullah Kakar
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Department of Biotechnology, BUITEMS, Quetta, Pakistan
| | - Inga Nagel
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Enrique Audain
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Marc-Phillip Hitz
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Varun K A Sreenivasan
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany.
| | - Malte Spielmann
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck.
| |
Collapse
|
2
|
Zhong G, Choi YA, Shen Y. VBASS enables integration of single cell gene expression data in Bayesian association analysis of rare variants. Commun Biol 2023; 6:774. [PMID: 37491581 PMCID: PMC10368729 DOI: 10.1038/s42003-023-05155-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 07/18/2023] [Indexed: 07/27/2023] Open
Abstract
Rare or de novo variants have substantial contribution to human diseases, but the statistical power to identify risk genes by rare variants is generally low due to rarity of genotype data. Previous studies have shown that risk genes usually have high expression in relevant cell types, although for many conditions the identity of these cell types are largely unknown. Recent efforts in single cell atlas in human and model organisms produced large amount of gene expression data. Here we present VBASS, a Bayesian method that integrates single-cell expression and de novo variant (DNV) data to improve power of disease risk gene discovery. VBASS models disease risk prior as a function of expression profiles, approximated by deep neural networks. It learns the weights of neural networks and parameters of Gamma-Poisson likelihood models of DNV counts jointly from expression and genetics data. On simulated data, VBASS shows proper error rate control and better power than state-of-the-art methods. We applied VBASS to published datasets and identified more candidate risk genes with supports from literature or data from independent cohorts. VBASS can be generalized to integrate other types of functional genomics data in statistical genetics analysis.
Collapse
Affiliation(s)
- Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University Irving Medical Center, New York, NY, USA
| | - Yoolim A Choi
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
3
|
Xie Y, Li M, Dong W, Jiang W, Zhao H. M-DATA: A statistical approach to jointly analyzing de novo mutations for multiple traits. PLoS Genet 2021; 17:e1009849. [PMID: 34735430 PMCID: PMC8568192 DOI: 10.1371/journal.pgen.1009849] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/29/2021] [Indexed: 11/22/2022] Open
Abstract
Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.
Collapse
Affiliation(s)
- Yuhan Xie
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Weilai Dong
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|