1
|
Wu Y, Xie L. AI-driven multi-omics integration for multi-scale predictive modeling of genotype-environment-phenotype relationships. Comput Struct Biotechnol J 2025; 27:265-277. [PMID: 39886532 PMCID: PMC11779603 DOI: 10.1016/j.csbj.2024.12.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/22/2024] [Accepted: 12/26/2024] [Indexed: 02/01/2025] Open
Abstract
Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically significant causal factors, limiting their predictive power. Key challenges in predictive modeling include scarcity of labeled data, generalization across different domains, and disentangling causation from correlation. In light of recent advances in multi-omics data integration, we propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues. This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict genotype-environment-phenotype relationships under various conditions. AI models inspired by biology may identify novel molecular targets, biomarkers, pharmaceutical agents, and personalized medicines for presently unmet medical needs.
Collapse
Affiliation(s)
- You Wu
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY, USA
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY, USA
- Ph.D. Program in Biology and Biochemistry, The Graduate Center, The City University of New York, New York, NY, USA
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, USA
- Helen & Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, NY, USA
| |
Collapse
|
2
|
Van Norden M, Mangione W, Falls Z, Samudrala R. Strategies for robust, accurate, and generalizable benchmarking of drug discovery platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627863. [PMID: 39764006 PMCID: PMC11702551 DOI: 10.1101/2024.12.10.627863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Benchmarking is an important step in the improvement, assessment, and comparison of the performance of drug discovery platforms and technologies. We revised the existing benchmarking protocols in our Computational Analysis of Novel Drug Opportunities (CANDO) multiscale therapeutic discovery platform to improve utility and performance. We optimized multiple parameters used in drug candidate prediction and assessment with these updated benchmarking protocols. CANDO ranked 7.4% of known drugs in the top 10 compounds for their respective diseases/indications based on drug-indication associations/mappings obtained from the Comparative Toxicogenomics Database (CTD) using these optimized parameters. This increased to 12.1% when drug-indication mappings were obtained from the Therapeutic Targets Database. Performance on an indication was weakly correlated (Spearman correlation coefficient >0.3) with indication size (number of drugs associated with an indication) and moderately correlated (correlation coefficient >0.5) with compound chemical similarity. There was also moderate correlation between our new and original benchmarking protocols when assessing performance per indication using each protocol. Benchmarking results were also dependent on the source of the drug-indication mapping used: a higher proportion of indication-associated drugs were recalled in the top 100 compounds when using the Therapeutic Targets Database (TTD), which only includes FDA-approved drug-indication associations (in contrast to the CTD, which includes associations drawn from the literature). We also created compbench, a publicly available head-to-head benchmarking protocol that allows consistent assessment and comparison of different drug discovery platforms. Using this protocol, we compared two pipelines for drug repurposing within CANDO; our primary pipeline outperformed another similarity-based pipeline still in development that clusters signatures based on their associated Gene Ontology terms. Our study sets a precedent for the complete, comprehensive, and comparable benchmarking of drug discovery platforms, resulting in more accurate drug candidate predictions.
Collapse
Affiliation(s)
- Melissa Van Norden
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - William Mangione
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - Zackary Falls
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - Ram Samudrala
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, USA
| |
Collapse
|
3
|
Fan Z, Zhao H, Zhou J, Li D, Fan Y, Bi Y, Ji S. A versatile attention-based neural network for chemical perturbation analysis and its potential to aid surgical treatment: an experimental study. Int J Surg 2024; 110:7671-7686. [PMID: 39017949 PMCID: PMC11634177 DOI: 10.1097/js9.0000000000001781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 05/30/2024] [Indexed: 07/18/2024]
Abstract
Deep learning models have emerged as rapid, accurate, and effective approaches for clinical decisions. Through a combination of drug screening and deep learning models, drugs that may benefit patients before and after surgery can be discovered to reduce the risk of complications or speed recovery. However, most existing drug prediction methods have high data requirements and lack interpretability, which has a limited role in adjuvant surgical treatment. To address these limitations, the authors propose the attention-based convolution transpositional interfusion network (ACTIN) for flexible and efficient drug discovery. ACTIN leverages the graph convolution and the transformer mechanism, utilizing drug and transcriptome data to assess the impact of chemical pharmacophores containing certain elements on gene expression. Remarkably, just with only 393 training instances, only one-tenth of the other models, ACTIN achieves state-of-the-art performance, demonstrating its effectiveness even with limited data. By incorporating chemical element embedding disparity and attention mechanism-based parameter analysis, it identifies the possible pharmacophore containing certain elements that could interfere with specific cell lines, which is particularly valuable for screening useful pharmacophores for new drugs tailored to adjuvant surgical treatment. To validate its reliability, the authors conducted comprehensive examinations by utilizing transcriptome data from the lung tissue of fatal COVID-19 patients as additional input for ACTIN, the authors generated novel lead chemicals that align with clinical evidence. In summary, ACTIN offers insights into the perturbation biases of elements within pharmacophore on gene expression, which holds the potential for guiding the development of new drugs that benefit surgical treatment.
Collapse
Affiliation(s)
- Zheqi Fan
- Department of Orthopaedics, The First Medical Centre, Chinese PLA General Hospital, Beijing
| | - Houming Zhao
- Department of Urology, The Third Medical Center, Chinese PLA General Hospital, Beijing
| | - Jingcheng Zhou
- Senior Department of Otolaryngology-Head and Neck Surgery, The Sixth Medical Center, Chinese PLA General Hospital, Beijing
| | - Dingchang Li
- Department of General Surgery, The First Medical Centre, Chinese PLA General Hospital, Beijing
| | - Yunlong Fan
- Department of Dermatology, The Seventh Medical Center, Chinese PLA General Hospital, Beijing
| | - Yiming Bi
- Graduate School of PLA Medical College, Chinese PLA General Hospital, Beijing, People’s Republic of China
| | - Shuaifei Ji
- Graduate School of PLA Medical College, Chinese PLA General Hospital, Beijing, People’s Republic of China
| |
Collapse
|
4
|
Tong X, Qu N, Kong X, Ni S, Zhou J, Wang K, Zhang L, Wen Y, Shi J, Zhang S, Li X, Zheng M. Deep representation learning of chemical-induced transcriptional profile for phenotype-based drug discovery. Nat Commun 2024; 15:5378. [PMID: 38918369 PMCID: PMC11199551 DOI: 10.1038/s41467-024-49620-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 06/10/2024] [Indexed: 06/27/2024] Open
Abstract
Artificial intelligence transforms drug discovery, with phenotype-based approaches emerging as a promising alternative to target-based methods, overcoming limitations like lack of well-defined targets. While chemical-induced transcriptional profiles offer a comprehensive view of drug mechanisms, inherent noise often obscures the true signal, hindering their potential for meaningful insights. Here, we highlight the development of TranSiGen, a deep generative model employing self-supervised representation learning. TranSiGen analyzes basal cell gene expression and molecular structures to reconstruct chemical-induced transcriptional profiles with high accuracy. By capturing both cellular and compound information, TranSiGen-derived representations demonstrate efficacy in diverse downstream tasks like ligand-based virtual screening, drug response prediction, and phenotype-based drug repurposing. Notably, in vitro validation of TranSiGen's application in pancreatic cancer drug discovery highlights its potential for identifying effective compounds. We envisage that integrating TranSiGen into the drug discovery and mechanism research holds significant promise for advancing biomedicine.
Collapse
Affiliation(s)
- Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Shengkun Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jingyi Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Lingang Laboratory, Shanghai, 200031, China
| | - Kun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Jiangshan Shi
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.
| |
Collapse
|
5
|
Han J, Kang MJ, Lee S. DRSPRING: Graph convolutional network (GCN)-Based drug synergy prediction utilizing drug-induced gene expression profile. Comput Biol Med 2024; 174:108436. [PMID: 38643597 DOI: 10.1016/j.compbiomed.2024.108436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 04/01/2024] [Accepted: 04/07/2024] [Indexed: 04/23/2024]
Abstract
Great efforts have been made over the years to identify novel drug pairs with synergistic effects. Although numerous computational approaches have been proposed to analyze diverse types of biological big data, the pharmacogenomic profiles, presumably the most direct proxy of drug effects, have been rarely used due to the data sparsity problem. In this study, we developed a composite deep-learning-based model that predicts the drug synergy effect utilizing pharmacogenomic profiles as well as molecular properties. Graph convolutional network (GCN) was used to represent and integrate the chemical structure, genetic interactions, drug-target information, and gene expression profiles of cell lines. Insufficient amount of pharmacogenomic data, i.e., drug-induced expression profiles from the LINCS project, was resolved by augmenting the data with the predicted profiles. Our method learned and predicted the Loewe synergy score in the DrugComb database and achieved a better or comparable performance compared to other published methods in a benchmark test. We also investigated contribution of various input features, which highlighted the value of basal gene expression and pharmacogenomic profiles of each cell line. Importantly, DRSPRING (DRug Synergy PRediction by INtegrated GCN) can be applied to any drug pairs and any cell lines, greatly expanding its applicability compared to previous methods.
Collapse
Affiliation(s)
- Jiyeon Han
- Department of Bio-Information Science, Ewha Womans University, Seoul, 03760, Republic of Korea
| | - Min Ji Kang
- Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea
| | - Sanghyuk Lee
- Department of Bio-Information Science, Ewha Womans University, Seoul, 03760, Republic of Korea; Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea.
| |
Collapse
|
6
|
Wu Y, Liu Q, Xie L. Hierarchical multi-omics data integration and modeling predict cell-specific chemical proteomics and drug responses. CELL REPORTS METHODS 2023; 3:100452. [PMID: 37159671 PMCID: PMC10163019 DOI: 10.1016/j.crmeth.2023.100452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 12/28/2022] [Accepted: 03/22/2023] [Indexed: 05/11/2023]
Abstract
Drug-induced phenotypes result from biomolecular interactions across various levels of a biological system. Characterization of pharmacological actions therefore requires integration of multi-omics data. Proteomics profiles, which may more directly reflect disease mechanisms and biomarkers than transcriptomics, have not been widely exploited due to data scarcity and frequent missing values. A computational method for inferring drug-induced proteome patterns would therefore enable progress in systems pharmacology. To predict the proteome profiles and corresponding phenotypes of an uncharacterized cell or tissue type that has been disturbed by an uncharacterized chemical, we developed an end-to-end deep learning framework: TransPro. TransPro hierarchically integrated multi-omics data, in line with the central dogma of molecular biology. Our in-depth assessments of TransPro's predictions of anti-cancer drug sensitivity and drug adverse reactions reveal that TransPro's accuracy is on par with that of experimental data. Hence, TransPro may facilitate the imputation of proteomics data and compound screening in systems pharmacology.
Collapse
Affiliation(s)
- You Wu
- The Graduate Center, City University of New York, New York, NY 10016, USA
| | - Qiao Liu
- The Graduate Center, City University of New York, New York, NY 10016, USA
| | - Lei Xie
- The Graduate Center, City University of New York, New York, NY 10016, USA
- Hunter College, City University of New York, New York, NY 10065, USA
- Weill Cornell Medicine, Cornell University, New York, NY 10021, USA
| |
Collapse
|
7
|
Pruteanu LL, Bender A. Using Transcriptomics and Cell Morphology Data in Drug Discovery: The Long Road to Practice. ACS Med Chem Lett 2023; 14:386-395. [PMID: 37077392 PMCID: PMC10107910 DOI: 10.1021/acsmedchemlett.3c00015] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 03/10/2023] [Indexed: 04/21/2023] Open
Abstract
Gene expression and cell morphology data are high-dimensional biological readouts of much recent interest for drug discovery. They are able to describe biological systems in different states (e.g., healthy and diseased), as well as biological systems before and after compound treatment, and they are hence useful for matching both spaces (e.g., for drug repurposing) as well as for characterizing compounds with respect to efficacy and safety endpoints. This Microperspective describes recent advances in this direction with a focus on applied drug discovery and drug repurposing, as well as outlining what else is needed to advance further, with a particular focus on better understanding the applicability domain of readouts and their relevance for decision making, which is currently often still unclear.
Collapse
Affiliation(s)
- Lavinia-Lorena Pruteanu
- Department
of Chemistry and Biology, North University
Center at Baia Mare, Technical University of Cluj-Napoca, Victoriei 76, 430122 Baia Mare, Romania
- Research
Center for Functional Genomics, Biomedicine, and Translational Medicine, “Iuliu Haţieganu” University
of Medicine and Pharmacy, 400337 Cluj-Napoca, Romania
| | - Andreas Bender
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|