1
|
Tejada-Lapuerta A, Bertin P, Bauer S, Aliee H, Bengio Y, Theis FJ. Causal machine learning for single-cell genomics. Nat Genet 2025; 57:797-808. [PMID: 40164735 DOI: 10.1038/s41588-025-02124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/10/2025] [Indexed: 04/02/2025]
Abstract
Advances in single-cell '-omics' allow unprecedented insights into the transcriptional profiles of individual cells and, when combined with large-scale perturbation screens, enable measuring of the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes. In this Perspective, we delineate the application of causal machine learning to single-cell genomics and its associated challenges. We first present the causal model that is most commonly applied to single-cell biology and then identify and discuss potential approaches to three open problems: the lack of generalization of models to novel experimental conditions, the complexity of interpreting learned models, and the difficulty of learning cell dynamics.
Collapse
Affiliation(s)
- Alejandro Tejada-Lapuerta
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
| | - Paul Bertin
- Mila, the Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Stefan Bauer
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
- Helmholtz Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
| | | | - Yoshua Bengio
- Mila, the Quebec AI Institute, Montreal, Quebec, Canada.
- Université de Montréal, Montreal, Quebec, Canada.
- Learning in Machines and Brains Program, Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany.
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
2
|
Dibaeinia P, Ojha A, Sinha S. Interpretable AI for inference of causal molecular relationships from omics data. SCIENCE ADVANCES 2025; 11:eadk0837. [PMID: 39951525 PMCID: PMC11827637 DOI: 10.1126/sciadv.adk0837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/14/2025] [Indexed: 02/16/2025]
Abstract
The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Abhishek Ojha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- H. Milton School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
3
|
Tian L, Wan E, Celine Chui SL, Li S, Chan E, Luo H, Wong ICK, Zhang Q. Deciphering the molecular mechanism of post-acute sequelae of COVID-19 through comorbidity network analysis. CHAOS (WOODBURY, N.Y.) 2025; 35:021102. [PMID: 39977305 DOI: 10.1063/5.0250923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Accepted: 01/11/2025] [Indexed: 02/22/2025]
Abstract
The post-acute sequelae of COVID-19 (PASC) poses a significant health challenge in the post-pandemic world. However, the underlying biological mechanisms of PASC remain intricate and elusive. Network-based methods can leverage electronic health record data and biological knowledge to investigate the impact of COVID-19 on PASC and uncover the underlying biological mechanisms. This study analyzed territory-wide longitudinal electronic health records (from January 1, 2020 to August 31, 2022) of 50 296 COVID-19 patients and a healthy non-exposed group of 100 592 individuals to determine the impact of COVID-19 on disease progression, provide molecular insights, and identify associated biomarkers. We constructed a comorbidity network and performed disease-protein mapping and protein-protein interaction network analysis to reveal the impact of COVID-19 on disease trajectories. Results showed disparities in prevalent disease comorbidity patterns, with certain patterns exhibiting a more pronounced influence by COVID-19. Overlapping proteins elucidate the biological mechanisms of COVID-19's impact on each comorbidity pattern, and essential proteins can be identified based on their weights. Our findings can help clarify the biological mechanisms of COVID-19, discover intervention methods, and decode the molecular basis of comorbidity associations, while also yielding potential biomarkers and corresponding treatments for specific disease progression patterns.
Collapse
Affiliation(s)
- Lue Tian
- School of Data Science, City University of Hong Kong, Hong Kong, China
| | - Eric Wan
- Department of Pharmacology and Pharmacy, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- Department of Family Medicine and Primary Care, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- Laboratory of Data Discovery for Health, Hong Kong, China
| | - Sze Ling Celine Chui
- Laboratory of Data Discovery for Health, Hong Kong, China
- School of Nursing, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Shirely Li
- Department of Pharmacology and Pharmacy, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- Laboratory of Data Discovery for Health, Hong Kong, China
- Department of Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Esther Chan
- Department of Pharmacology and Pharmacy, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- Laboratory of Data Discovery for Health, Hong Kong, China
| | - Hao Luo
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong, China
- School of Public Health Sciences, The University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| | - Ian C K Wong
- School of Data Science, City University of Hong Kong, Hong Kong, China
- Department of Pharmacology and Pharmacy, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- Laboratory of Data Discovery for Health, Hong Kong, China
- School of Pharmacy, Aston University, Birmingham B4 7ET, United Kingdom
| | - Qingpeng Zhang
- School of Data Science, City University of Hong Kong, Hong Kong, China
- Department of Pharmacology and Pharmacy, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- Laboratory of Data Discovery for Health, Hong Kong, China
| |
Collapse
|
4
|
Cingiz MÖ. k- Strong Inference Algorithm: A Hybrid Information Theory Based Gene Network Inference Algorithm. Mol Biotechnol 2024; 66:3213-3225. [PMID: 37950851 DOI: 10.1007/s12033-023-00929-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/05/2023] [Indexed: 11/13/2023]
Abstract
Gene networks allow researchers to understand the underlying mechanisms between diseases and genes while reducing the need for wet lab experiments. Numerous gene network inference (GNI) algorithms have been presented in the literature to infer accurate gene networks. We proposed a hybrid GNI algorithm, k-Strong Inference Algorithm (ksia), to infer more reliable and robust gene networks from omics datasets. To increase reliability, ksia integrates Pearson correlation coefficient (PCC) and Spearman rank correlation coefficient (SCC) scores to determine mutual information scores between molecules to increase diversity of relation predictions. To infer a more robust gene network, ksia applies three different elimination steps to remove redundant and spurious relations between genes. The performance of ksia was evaluated on microbe microarrays database in the overlap analysis with other GNI algorithms, namely ARACNE, C3NET, CLR, and MRNET. Ksia inferred less number of relations due to its strict elimination steps. However, ksia generally performed better on Escherichia coli (E.coli) and Saccharomyces cerevisiae (yeast) gene expression datasets due to F- measure and precision values. The integration of association estimator scores and three elimination stages slightly increases the performance of ksia based gene networks. Users can access ksia R package and user manual of package via https://github.com/ozgurcingiz/ksia .
Collapse
Affiliation(s)
- Mustafa Özgür Cingiz
- Computer Engineering Department, Faculty of Engineering and Natural Sciences, Bursa Technical University, Mimar Sinan Campus, Yildirim, 16310, Bursa, Turkey.
| |
Collapse
|
5
|
Maizels RJ. A dynamical perspective: moving towards mechanism in single-cell transcriptomics. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230049. [PMID: 38432314 PMCID: PMC10909508 DOI: 10.1098/rstb.2023.0049] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/31/2023] [Indexed: 03/05/2024] Open
Abstract
As the field of single-cell transcriptomics matures, research is shifting focus from phenomenological descriptions of cellular phenotypes to a mechanistic understanding of the gene regulation underneath. This perspective considers the value of capturing dynamical information at single-cell resolution for gaining mechanistic insight; reviews the available technologies for recording and inferring temporal information in single cells; and explores whether better dynamical resolution is sufficient to adequately capture the causal relationships driving complex biological systems. This article is part of a discussion meeting issue 'Causes and consequences of stochastic processes in development and disease'.
Collapse
Affiliation(s)
- Rory J. Maizels
- The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
- University College London, London WC1E 6BT, UK
| |
Collapse
|
6
|
Han Y, Zhou Q, Liu L, Li J, Zhou Y. DNI-MDCAP: improvement of causal MiRNA-disease association prediction based on deep network imputation. BMC Bioinformatics 2024; 25:22. [PMID: 38216907 PMCID: PMC10785389 DOI: 10.1186/s12859-024-05644-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 01/08/2024] [Indexed: 01/14/2024] Open
Abstract
BACKGROUND MiRNAs are involved in the occurrence and development of many diseases. Extensive literature studies have demonstrated that miRNA-disease associations are stratified and encompass ~ 20% causal associations. Computational models that predict causal miRNA-disease associations provide effective guidance in identifying novel interpretations of disease mechanisms and potential therapeutic targets. Although several predictive models for miRNA-disease associations exist, it is still challenging to discriminate causal miRNA-disease associations from non-causal ones. Hence, there is a pressing need to develop an efficient prediction model for causal miRNA-disease association prediction. RESULTS We developed DNI-MDCAP, an improved computational model that incorporated additional miRNA similarity metrics, deep graph embedding learning-based network imputation and semi-supervised learning framework. Through extensive predictive performance evaluation, including tenfold cross-validation and independent test, DNI-MDCAP showed excellent performance in identifying causal miRNA-disease associations, achieving an area under the receiver operating characteristic curve (AUROC) of 0.896 and 0.889, respectively. Regarding the challenge of discriminating causal miRNA-disease associations from non-causal ones, DNI-MDCAP exhibited superior predictive performance compared to existing models MDCAP and LE-MDCAP, reaching an AUROC of 0.870. Wilcoxon test also indicated significantly higher prediction scores for causal associations than for non-causal ones. Finally, the potential causal miRNA-disease associations predicted by DNI-MDCAP, exemplified by diabetic nephropathies and hsa-miR-193a, have been validated by recently published literature, further supporting the reliability of the prediction model. CONCLUSIONS DNI-MDCAP is a dedicated tool to specifically distinguish causal miRNA-disease associations with substantially improved accuracy. DNI-MDCAP is freely accessible at http://www.rnanut.net/DNIMDCAP/ .
Collapse
Affiliation(s)
- Yu Han
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Qiong Zhou
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Leibo Liu
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Yuan Zhou
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China.
- State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing, China.
| |
Collapse
|
7
|
Feng K, Jiang H, Yin C, Sun H. Gene regulatory network inference based on causal discovery integrating with graph neural network. QUANTITATIVE BIOLOGY 2023; 11:434-450. [DOI: 10.1002/qub2.26] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 04/04/2023] [Indexed: 01/06/2025]
Abstract
AbstractGene regulatory network (GRN) inference from gene expression data is a significant approach to understanding aspects of the biological system. Compared with generalized correlation‐based methods, causality‐inspired ones seem more rational to infer regulatory relationships. We propose GRINCD, a novel GRN inference framework empowered by graph representation learning and causal asymmetric learning, considering both linear and non‐linear regulatory relationships. First, high‐quality representation of each gene is generated using graph neural network. Then, we apply the additive noise model to predict the causal regulation of each regulator‐target pair. Additionally, we design two channels and finally assemble them for robust prediction. Through comprehensive comparisons of our framework with state‐of‐the‐art methods based on different principles on numerous datasets of diverse types and scales, the experimental results show that our framework achieves superior or comparable performance under various evaluation metrics. Our work provides a new clue for constructing GRNs, and our proposed framework GRINCD also shows potential in identifying key factors affecting cancer development.
Collapse
Affiliation(s)
- Ke Feng
- School of Artificial Intelligence Jilin University Changchun China
| | - Hongyang Jiang
- School of Artificial Intelligence Jilin University Changchun China
| | - Chaoyi Yin
- School of Artificial Intelligence Jilin University Changchun China
| | - Huiyan Sun
- School of Artificial Intelligence Jilin University Changchun China
- International Center of Future Science Jilin University Changchun China
- Engineering Research Center of Knowledge‐Driven Human‐Machine Intelligence Ministry of Education Changchun China
| |
Collapse
|
8
|
Squires C, Uhler C. Causal Structure Learning: A Combinatorial Perspective. FOUNDATIONS OF COMPUTATIONAL MATHEMATICS (NEW YORK, N.Y.) 2022; 23:1-35. [PMID: 35935470 PMCID: PMC9342837 DOI: 10.1007/s10208-022-09581-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 06/08/2022] [Indexed: 05/29/2023]
Abstract
In this review, we discuss approaches for learning causal structure from data, also called causal discovery. In particular, we focus on approaches for learning directed acyclic graphs and various generalizations which allow for some variables to be unobserved in the available data. We devote special attention to two fundamental combinatorial aspects of causal structure learning. First, we discuss the structure of the search space over causal graphs. Second, we discuss the structure of equivalence classes over causal graphs, i.e., sets of graphs which represent what can be learned from observational data alone, and how these equivalence classes can be refined by adding interventional data.
Collapse
Affiliation(s)
| | - Caroline Uhler
- Broad Institute and Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| |
Collapse
|