1
|
Dai W, Chen G, Peng W, Chen C, Fu X, Liu L, Liu L, Yu N. Domain alignment method based on masked variational autoencoder for predicting patient anticancer drug response. Methods 2025; 238:61-73. [PMID: 40090506 DOI: 10.1016/j.ymeth.2025.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 02/03/2025] [Accepted: 03/14/2025] [Indexed: 03/18/2025] Open
Abstract
Predicting the patient's response to anticancer drugs is essential in personalized treatment plans. However, due to significant distribution differences between cell line data and patient data, models trained well on cell line data may perform poorly on patient anticancer drug response predictions. Some existing methods use transfer learning strategies to implement domain feature alignment between cell lines and patient data and leverage knowledge from cell lines to predict patient anticancer drug responses. This study proposes a domain alignment method based on masked variational autoencoders, MVAEDA, to predict patient anticancer drug responses. The model constructs multiple variational autoencoders (VAEs) and mask predictors to extract specific and domain-invariant features of cell lines and patients. Then, it masks and reconstructs the gene expression matrix, using generative adversarial training to learn domain-invariant features from the cell line and patient domains. These domain-invariant features are then used to train a classifier. Finally, the final trained model predicts the anticancer drug response in the target domain. Our model is experimentally evaluated on the clinical dataset and the preclinical dataset. The results show that our method performs better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Gong Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Chuyue Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport, NY 14422, United States.
| |
Collapse
|
2
|
Shi H, Xu T, Li X, Gao Q, Xiong Z, Xia J, Yue Z. DRExplainer: Quantifiable interpretability in drug response prediction with directed graph convolutional network. Artif Intell Med 2025; 163:103101. [PMID: 40056540 DOI: 10.1016/j.artmed.2025.103101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 01/08/2025] [Accepted: 02/23/2025] [Indexed: 03/10/2025]
Abstract
Predicting the response of a cancer cell line to a therapeutic drug is pivotal for personalized medicine. Despite numerous deep learning methods that have been developed for drug response prediction, integrating diverse information about biological entities and predicting the directional response remain major challenges. Here, we propose a novel interpretable predictive model, DRExplainer, which leverages a directed graph convolutional network to enhance the prediction in a directed bipartite network framework. DRExplainer constructs a directed bipartite network integrating multi-omics profiles of cell lines, the chemical structure of drugs and known drug response to achieve directed prediction. Then, DRExplainer identifies the most relevant subgraph to each prediction in this directed bipartite network by learning a mask, facilitating critical medical decision-making. Additionally, we introduce a quantifiable method for model interpretability that leverages a ground truth benchmark dataset curated from biological features. In computational experiments, DRExplainer outperforms state-of-the-art predictive methods and another graph-based explanation method under the same experimental setting. Finally, the case studies further validate the interpretability and the effectiveness of DRExplainer in predictive novel drug response. Our code is available at: https://github.com/vshy-dream/DRExplainer.
Collapse
Affiliation(s)
- Haoyuan Shi
- University of Science and Technology of China, Hefei, 230026, Anhui, China; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Tao Xu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Xiaodi Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Qian Gao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Zhiwei Xiong
- University of Science and Technology of China, Hefei, 230026, Anhui, China.
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230036, Anhui, China.
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
3
|
Peng W, Ma Y, Li C, Dai W, Fu X, Liu L, Liu L, Liu J. Fusion of brain imaging genetic data for alzheimer's disease diagnosis and causal factors identification using multi-stream attention mechanisms and graph convolutional networks. Neural Netw 2025; 184:107020. [PMID: 39721106 DOI: 10.1016/j.neunet.2024.107020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 11/03/2024] [Accepted: 12/03/2024] [Indexed: 12/28/2024]
Abstract
Correctly diagnosing Alzheimer's disease (AD) and identifying pathogenic brain regions and genes play a vital role in understanding the AD and developing effective prevention and treatment strategies. Recent works combine imaging and genetic data, and leverage the strengths of both modalities to achieve better classification results. In this work, we propose MCA-GCN, a Multi-stream Cross-Attention and Graph Convolutional Network-based classification method for AD patients. It first constructs a brain region-gene association network based on brain region fMRI time series and gene SNP data. Then it integrates the absolute and relative positions of the brain region time series to obtain a new brain region time series containing temporal information. Then long-range and local association features between brain regions and genes are sequentially aggregated by multi-stream cross-attention and graph convolutional networks. Finally, the learned brain region and gene features are input to the fully connected network to predict AD types. Experimental results on the ADNI dataset show that our model outperforms other methods in AD classification tasks. Moreover, MCA-GCN designed a multi-stage feature scoring process to extract high-risk genes and brain regions related to disease classification.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China.
| | - Yanhan Ma
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China
| | - Chunshan Li
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology; Kunming 650500, PR China; Computer Technology Application Key Lab of Yunnan Province; Kunming 650500, PR China
| | - Jin Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, PR China
| |
Collapse
|
4
|
Jiang W, Ye W, Tan X, Bao YJ. Network-based multi-omics integrative analysis methods in drug discovery: a systematic review. BioData Min 2025; 18:27. [PMID: 40155979 PMCID: PMC11954193 DOI: 10.1186/s13040-025-00442-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 03/17/2025] [Indexed: 04/01/2025] Open
Abstract
The integration of multi-omics data from diverse high-throughput technologies has revolutionized drug discovery. While various network-based methods have been developed to integrate multi-omics data, systematic evaluation and comparison of these methods remain challenging. This review aims to analyze network-based approaches for multi-omics integration and evaluate their applications in drug discovery. We conducted a comprehensive review of literature (2015-2024) on network-based multi-omics integration methods in drug discovery, and categorized methods into four primary types: network propagation/diffusion, similarity-based approaches, graph neural networks, and network inference models. We also discussed the applications of the methods in three scenario of drug discovery, including drug target identification, drug response prediction, and drug repurposing, and finally evaluated the performance of the methods by highlighting their advantages and limitations in specific applications. While network-based multi-omics integration has shown promise in drug discovery, challenges remain in computational scalability, data integration, and biological interpretation. Future developments should focus on incorporating temporal and spatial dynamics, improving model interpretability, and establishing standardized evaluation frameworks.
Collapse
Affiliation(s)
- Wei Jiang
- School of Life Sciences, Hubei University, Wuhan, China
| | - Weicai Ye
- School of Computer Science and Engineering, Guangdong Province Key Laboratory of Computational Science, National Engineering Laboratory for Big Data Analysis and Application, Sun Yat-sen University, Guangzhou, China
| | - Xiaoming Tan
- School of Life Sciences, Hubei University, Wuhan, China
| | - Yun-Juan Bao
- School of Life Sciences, Hubei University, Wuhan, China.
- , No.368 Youyi Avenue, Wuhan, 430062, China.
| |
Collapse
|
5
|
Luo H, Yang H, Zhang G, Wang J, Luo J, Yan C. KGRDR: a deep learning model based on knowledge graph and graph regularized integration for drug repositioning. Front Pharmacol 2025; 16:1525029. [PMID: 40008124 PMCID: PMC11850324 DOI: 10.3389/fphar.2025.1525029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 01/13/2025] [Indexed: 02/27/2025] Open
Abstract
Computational drug repositioning, serving as an effective alternative to traditional drug discovery plays a key role in optimizing drug development. This approach can accelerate the development of new therapeutic options while reducing costs and mitigating risks. In this study, we propose a novel deep learning-based framework KGRDR containing multi-similarity integration and knowledge graph learning to predict potential drug-disease interactions. Specifically, a graph regularized approach is applied to integrate multiple drug and disease similarity information, which can effectively eliminate noise data and obtain integrated similarity features of drugs and diseases. Then, topological feature representations of drugs and diseases are learned from constructed biomedical knowledge graphs (KGs) which encompasses known drug-related and disease-related interactions. Next, the similarity features and topological features are fused by utilizing an attention-based feature fusion method. Finally, drug-disease associations are predicted using the graph convolutional network. Experimental results demonstrate that KGRDR achieves better performance when compared with the state-of-the-art drug-disease prediction methods. Moreover, case study results further validate the effectiveness of KGRDR in predicting novel drug-disease interactions.
Collapse
Affiliation(s)
- Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Hui Yang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
- Academy for Advanced Interdisciplinary Studies, Henan University, Zhengzhou, China
| |
Collapse
|
6
|
Gao Q, Xu T, Li X, Gao W, Shi H, Zhang Y, Chen J, Yue Z. Interpretable Dynamic Directed Graph Convolutional Network for Multi-Relational Prediction of Missense Mutation and Drug Response. IEEE J Biomed Health Inform 2025; 29:1514-1524. [PMID: 39423073 DOI: 10.1109/jbhi.2024.3483316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2024]
Abstract
Tumor heterogeneity presents a significant challenge in predicting drug responses, especially as missense mutations within the same gene can lead to varied outcomes such as drug resistance, enhanced sensitivity, or therapeutic ineffectiveness. These complex relationships highlight the need for advanced analytical approaches in oncology. Due to their powerful ability to handle heterogeneous data, graph convolutional networks (GCNs) represent a promising approach for predicting drug responses. However, simple bipartite graphs cannot accurately capture the complex relationships involved in missense mutation and drug response. Furthermore, Deep learning models for drug response are often considered "black boxes", and their interpretability remains a widely discussed issue. To address these challenges, we propose an Interpretable Dynamic Directed Graph Convolutional Network (IDDGCN) framework, which incorporates four key features: 1) the use of directed graphs to differentiate between sensitivity and resistance relationships, 2) the dynamic updating of node weights based on node-specific interactions, 3) the exploration of associations between different mutations within the same gene and drug response, and 4) the enhancement of interpretability models through the integration of a weighted mechanism that accounts for the biological significance, alongside a ground truth construction method to evaluate prediction transparency. The experimental results demonstrate that IDDGCN outperforms existing state-of-the-art models, exhibiting excellent predictive power. Both qualitative and quantitative evaluations of its interpretability further highlight its ability to explain predictions, offering a fresh perspective for precision oncology and targeted drug development.
Collapse
|
7
|
Liu F, Cai B, Lian S, Chang X, Chen D, Pu Z, Bao L, Wang J, Lv J, Zheng H, Bao Z, Zhang L, Wang S, Li Y. MolluscDB 2.0: a comprehensive functional and evolutionary genomics database for over 1400 molluscan species. Nucleic Acids Res 2025; 53:D1075-D1086. [PMID: 39530242 PMCID: PMC11701707 DOI: 10.1093/nar/gkae1026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 11/16/2024] Open
Abstract
Mollusca represents the second-largest animal phylum but remains less explored genomically. The increase in high-quality genomes and diverse functional genomic data holds great promise for advancing our understanding of molluscan biology and evolution. To address the opportunities and challenges facing the molluscan research community in managing vast multi-omics resources, we developed MolluscDB 2.0 (http://mgbase.qnlm.ac), which integrates extensive functional genomic data and offers user-friendly tools for multilevel integrative and comparative analyses. MolluscDB 2.0 covers 1450 species across all eight molluscan classes and compiles ∼4200 datasets, making it the most comprehensive multi-omics resource for molluscs to date. MolluscDB 2.0 expands the layers of multi-omics data, including genomes, bulk transcriptomes, single-cell transcriptomes, proteomes, epigenomes and metagenomes. MolluscDB 2.0 also more than doubles the number of functional modules and analytical tools, updating 14 original modules and introducing 20 new, specialized modules. Overall, MolluscDB 2.0 provides highly valuable, open-access multi-omics platform for the molluscan research community, expediting scientific discoveries and deepening our understanding of molluscan biology and evolution.
Collapse
Affiliation(s)
- Fuyun Liu
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Bingcheng Cai
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Shanshan Lian
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Xinyao Chang
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Dongsheng Chen
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Zhongqi Pu
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Lisui Bao
- Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
| | - Jing Wang
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Jia Lv
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Hongkun Zheng
- Biomarker Technologies Corporation, Beijing 101300, China
| | - Zhenmin Bao
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China
- Key Laboratory of Tropical Aquatic Germplasm of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya 572000, China
| | - Lingling Zhang
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Shi Wang
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China
- Key Laboratory of Tropical Aquatic Germplasm of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya 572000, China
| | - Yuli Li
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology & Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Tropical Aquatic Germplasm of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya 572000, China
| |
Collapse
|
8
|
Dong Y, Zhang Y, Qian Y, Zhao Y, Yang Z, Feng X. ASGCL: Adaptive Sparse Mapping-based graph contrastive learning network for cancer drug response prediction. PLoS Comput Biol 2025; 21:e1012748. [PMID: 39883719 PMCID: PMC11781687 DOI: 10.1371/journal.pcbi.1012748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 12/23/2024] [Indexed: 02/01/2025] Open
Abstract
Personalized cancer drug treatment is emerging as a frontier issue in modern medical research. Considering the genomic differences among cancer patients, determining the most effective drug treatment plan is a complex and crucial task. In response to these challenges, this study introduces the Adaptive Sparse Graph Contrastive Learning Network (ASGCL), an innovative approach to unraveling latent interactions in the complex context of cancer cell lines and drugs. The core of ASGCL is the GraphMorpher module, an innovative component that enhances the input graph structure via strategic node attribute masking and topological pruning. By contrasting the augmented graph with the original input, the model delineates distinct positive and negative sample sets at both node and graph levels. This dual-level contrastive approach significantly amplifies the model's discriminatory prowess in identifying nuanced drug responses. Leveraging a synergistic combination of supervised and contrastive loss, ASGCL accomplishes end-to-end learning of feature representations, substantially outperforming existing methodologies. Comprehensive ablation studies underscore the efficacy of each component, corroborating the model's robustness. Experimental evaluations further illuminate ASGCL's proficiency in predicting drug responses, offering a potent tool for guiding clinical decision-making in cancer therapy.
Collapse
Affiliation(s)
- Yunyun Dong
- School of Software, Taiyuan University of Technology, Taiyuan, China
- Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
| | - Yuanrong Zhang
- School of Software, Taiyuan University of Technology, Taiyuan, China
| | - Yuhua Qian
- Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| | - Yiming Zhao
- School of Software, Taiyuan University of Technology, Taiyuan, China
| | - Ziting Yang
- School of Software, Taiyuan University of Technology, Taiyuan, China
| | - Xiufang Feng
- School of Software, Taiyuan University of Technology, Taiyuan, China
| |
Collapse
|
9
|
Ballard JL, Wang Z, Li W, Shen L, Long Q. Deep learning-based approaches for multi-omics data integration and analysis. BioData Min 2024; 17:38. [PMID: 39358793 PMCID: PMC11446004 DOI: 10.1186/s13040-024-00391-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/06/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration. METHOD In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration. RESULTS Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data. CONCLUSION We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.
Collapse
Affiliation(s)
- Jenna L Ballard
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
| | - Zexuan Wang
- Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, 209 S. 33rd Street, Philadelphia, PA, 19104, USA
| | - Wenrui Li
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, CT, 06269, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| |
Collapse
|
10
|
Saranya KR, Vimina ER. DRN-CDR: A cancer drug response prediction model using multi-omics and drug features. Comput Biol Chem 2024; 112:108175. [PMID: 39191166 DOI: 10.1016/j.compbiolchem.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/09/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024]
Abstract
Cancer drug response (CDR) prediction is an important area of research that aims to personalize cancer therapy, optimizing treatment plans for maximum effectiveness while minimizing potential negative effects. Despite the advancements in Deep learning techniques, the effective integration of multi-omics data for drug response prediction remains challenging. In this paper, a regression method using Deep ResNet for CDR (DRN-CDR) prediction is proposed. We aim to explore the potential of considering sole cancer genes in drug response prediction. Here the multi-omics data such as gene expressions, mutation data, and methylation data along with the molecular structural information of drugs were integrated to predict the IC50 values of drugs. Drug features are extracted by employing a Uniform Graph Convolution Network, while Cell line features are extracted using a combination of Convolutional Neural Network and Fully Connected Networks. These features are then concatenated and fed into a deep ResNet for the prediction of IC50 values between Drug - Cell line pairs. The proposed method yielded higher Pearson's correlation coefficient (rp) of 0.7938 with lowest Root Mean Squared Error (RMSE) value of 0.92 when compared with similar methods of tCNNS, MOLI, DeepCDR, TGSA, NIHGCN, DeepTTA, GraTransDRP and TSGCNN. Further, when the model is extended to a classification problem to categorize drugs as sensitive or resistant, we achieved AUC and AUPR measures of 0.7623 and 0.7691, respectively. The drugs such as Tivozanib, SNX-2112, CGP-60474, PHA-665752, Foretinib etc., exhibited low median IC50 values and were found to be effective anti-cancer drugs. The case studies with different TCGA cancer types also revealed the effectiveness of SNX-2112, CGP-60474, Foretinib, Cisplatin, Vinblastine etc. This consistent pattern strongly suggests the effectiveness of the model in predicting CDR.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science and IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India
| | - E R Vimina
- Department of Computer Science and IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| |
Collapse
|
11
|
Xu M, Zhu Z, Zhao Y, He K, Huang Q, Zhao Y. RedCDR: Dual Relation Distillation for Cancer Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1468-1479. [PMID: 38776197 DOI: 10.1109/tcbb.2024.3404262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Based on multi-omics data and drug information, predicting the response of cancer cell lines to drugs is a crucial area of research in modern oncology, as it can promote the development of personalized treatments. Despite the promising performance achieved by existing models, most of them overlook the variations among different omics and lack effective integration of multi-omics data. Moreover, the explicit modeling of cell line/drug attribute and cell line-drug association has not been thoroughly investigated in existing approaches. To address these issues, we propose RedCDR, a dual relation distillation model for cancer drug response (CDR) prediction. Specifically, a parallel dual-branch architecture is designed to enable both the independent learning and interactive fusion feasible for cell line/drug attribute and cell line-drug association information. To facilitate the adaptive interacting integration of multi-omics data, the proposed multi-omics encoder introduces the multiple similarity relations between cell lines and takes the importance of different omics data into account. To accomplish knowledge transfer from the two independent attribute and association branches to their fusion, a dual relation distillation mechanism consisting of representation distillation and prediction distillation is presented. Experiments conducted on the GDSC and CCLE datasets show that RedCDR outperforms previous state-of-the-art approaches in CDR prediction.
Collapse
|
12
|
Guan Y, Xue Z, Wang J, Ai X, Chen R, Yi X, Lu S, Liu Y. SAFE-MIL: a statistically interpretable framework for screening potential targeted therapy patients based on risk estimation. Front Genet 2024; 15:1381851. [PMID: 39211737 PMCID: PMC11357964 DOI: 10.3389/fgene.2024.1381851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 07/31/2024] [Indexed: 09/04/2024] Open
Abstract
Patients with the target gene mutation frequently derive significant clinical benefits from target therapy. However, differences in the abundance level of mutations among patients resulted in varying survival benefits, even among patients with the same target gene mutations. Currently, there is a lack of rational and interpretable models to assess the risk of treatment failure. In this study, we investigated the underlying coupled factors contributing to variations in medication sensitivity and established a statistically interpretable framework, named SAFE-MIL, for risk estimation. We first constructed an effectiveness label for each patient from the perspective of exploring the optimal grouping of patients' positive judgment values and sampled patients into 600 and 1,000 groups, respectively, based on multi-instance learning (MIL). A novel and interpretable loss function was further designed based on the Hosmer-Lemeshow test for this framework. By integrating multi-instance learning with the Hosmer-Lemeshow test, SAFE-MIL is capable of accurately estimating the risk of drug treatment failure across diverse patient cohorts and providing the optimal threshold for assessing the risk stratification simultaneously. We conducted a comprehensive case study involving 457 non-small cell lung cancer patients with EGFR mutations treated with EGFR tyrosine kinase inhibitors. Results demonstrate that SAFE-MIL outperforms traditional regression methods with higher accuracy and can accurately assess patients' risk stratification. This underscores its ability to accurately capture inter-patient variability in risk while providing statistical interpretability. SAFE-MIL is able to effectively guide clinical decision-making regarding the use of drugs in targeted therapy and provides an interpretable computational framework for other patient stratification problems. The SAFE-MIL framework has proven its effectiveness in capturing inter-patient variability in risk and providing statistical interpretability. It outperforms traditional regression methods and can effectively guide clinical decision-making in the use of drugs for targeted therapy. SAFE-MIL offers a valuable interpretable computational framework that can be applied to other patient stratification problems, enhancing the precision of risk assessment in personalized medicine. The source code for SAFE-MIL is available for further exploration and application at https://github.com/Nevermore233/SAFE-MIL.
Collapse
Affiliation(s)
- Yanfang Guan
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
- Geneplus Beijing Institute, Beijing, China
| | - Zhengfa Xue
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xinghao Ai
- Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | | | - Xin Yi
- Geneplus Beijing Institute, Beijing, China
| | - Shun Lu
- Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuqian Liu
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
13
|
Luo Y, Li S, Peng L, Ding P, Liang W. Predicting associations between drugs and G protein-coupled receptors using a multi-graph convolutional network. Comput Biol Chem 2024; 110:108060. [PMID: 38579550 DOI: 10.1016/j.compbiolchem.2024.108060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/07/2024]
Abstract
Developing new drugs is an expensive, time-consuming process that frequently involves safety concerns. By discovering novel uses for previously verified drugs, drug repurposing helps to bypass the time-consuming and costly process of drug development. As the largest family of proteins targeted by verified drugs, G protein-coupled receptors (GPCR) are vital to efficiently repurpose drugs by inferring their associations with drugs. Drug repurposing may be sped up by computational models that predict the strength of novel drug-GPCR pairs interaction. To this end, a number of models have been put forth. In existing methods, however, drug structure, drug-drug interactions, GPCR sequence, and subfamily information couldn't simultaneously be taken into account to detect novel drugs-GPCR relationships. In this study, based on a multi-graph convolutional network, an end-to-end deep model was developed to efficiently and precisely discover latent drug-GPCR relationships by combining data from multi-sources. We demonstrated that our model, based on multi-graph convolutional networks, outperformed rival deep learning techniques as well as non-deep learning models in terms of inferring drug-GPCR relationships. Our results indicated that integrating data from multi-sources can lead to further advancement.
Collapse
Affiliation(s)
- Yuxun Luo
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China
| | - Shasha Li
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong 999077, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China.
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China
| | - Wei Liang
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China; Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China.
| |
Collapse
|
14
|
Yang X, Tang X, Li C, Han H. Singular value thresholding two-stage matrix completion for drug sensitivity discovery. Comput Biol Chem 2024; 110:108071. [PMID: 38718497 DOI: 10.1016/j.compbiolchem.2024.108071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 04/06/2024] [Accepted: 04/11/2024] [Indexed: 05/27/2024]
Abstract
Incomplete data presents significant challenges in drug sensitivity analysis, especially in critical areas like oncology, where precision is paramount. Our study introduces an innovative imputation method designed specifically for low-rank matrices, addressing the crucial challenge of data completion in anticancer drug sensitivity testing. Our method unfolds in two main stages: Initially, the singular value thresholding algorithm is employed for preliminary matrix completion, establishing a solid foundation for subsequent steps. Then, the matrix rows are segmented into distinct blocks based on hierarchical clustering of correlation coefficients, applying singular value thresholding to the largest block, which has been proved to possess the largest entropy. This is followed by a refined data restoration process, where the reconstructed largest block is integrated into the initial matrix completion to achieve the final matrix completion. Compared to other methods, our approach not only improves the accuracy of data restoration but also ensures the integrity and reliability of the imputed values, establishing it as a robust tool for future drug sensitivity analysis.
Collapse
Affiliation(s)
- Xuemei Yang
- School of Mathematics and Statistics, Xianyang Normal University, Xianyang, 712000, China.
| | - Xiaoduan Tang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China.
| | - Chun Li
- College of Elementary Education, Hainan Normal University, Haikou 571158, China; Key Laboratory of Data Science and Intelligence Education of Ministry of Education, Hainan Normal University, Haikou 571158, China.
| | - Henry Han
- The Laboratory of Data Science and Artificial Intelligence Innovation, Department of Computer Science, School of Engineering and Computer Science, Baylor University, Waco, TX 76798 USA.
| |
Collapse
|
15
|
Dey V, Ning X. Improving Anticancer Drug Selection and Prioritization via Neural Learning to Rank. J Chem Inf Model 2024; 64:4071-4088. [PMID: 38740382 PMCID: PMC11134508 DOI: 10.1021/acs.jcim.3c01060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most effective drugs for each cell line still remains a significant challenge. To address this, we developed multiple neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed multiple pairwise and listwise neural ranking methods that learn latent representations of drugs and cell lines and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed neural pairwise and listwise ranking methods, Pair-PushC and List-One on top of the existing methods, pLETORg and ListNet, respectively. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the effective drugs instead of the top effective drug, unlike List-One. We also provide an exhaustive empirical evaluation with state-of-the-art regression and ranking baselines on large-scale data sets across multiple experimental settings. Our results demonstrate that our proposed ranking methods mostly outperform the best baselines with significant improvements of as much as 25.6% in terms of selecting truly effective drugs within the top 20 predicted drugs (i.e., hit@20) across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United States
| |
Collapse
|
16
|
Liu H, Wang F, Yu J, Pan Y, Gong C, Zhang L, Zhang L. DBDNMF: A Dual Branch Deep Neural Matrix Factorization method for drug response prediction. PLoS Comput Biol 2024; 20:e1012012. [PMID: 38574114 PMCID: PMC11020650 DOI: 10.1371/journal.pcbi.1012012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 04/16/2024] [Accepted: 03/19/2024] [Indexed: 04/06/2024] Open
Abstract
Anti-cancer response of cell lines to drugs is in urgent need for individualized precision medical decision-making in the era of precision medicine. Measurements with wet-experiments is time-consuming and expensive and it is almost impossible for wide ranges of application. The design of computational models that can precisely predict the responses between drugs and cell lines could provide a credible reference for further research. Existing methods of response prediction based on matrix factorization or neural networks have revealed that both linear or nonlinear latent characteristics are applicable and effective for the precise prediction of drug responses. However, the majority of them consider only linear or nonlinear relationships for drug response prediction. Herein, we propose a Dual Branch Deep Neural Matrix Factorization (DBDNMF) method to address the above-mentioned issues. DBDNMF learns the latent representation of drugs and cell lines through flexible inputs and reconstructs the partially observed matrix through a series of hidden neural network layers. Experimental results on the datasets of Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) show that the accuracy of drug prediction exceeds state-of-the-art drug response prediction algorithms, demonstrating its reliability and stability. The hierarchical clustering results show that drugs with similar response levels tend to target similar signaling pathway, and cell lines coming from the same tissue subtype tend to share the same pattern of response, which are consistent with previously published studies.
Collapse
Affiliation(s)
- Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Feng Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Jian Yu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Yong Pan
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Chaoju Gong
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
- Department of Ophthalmology, Xuzhou First People’s Hospital, Xuzhou, Jiangsu, China
| | - Liang Zhang
- Department of Gastrointestinal Surgery, Xuzhou Central Hospital, Xuzhou, Jiangsu, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| |
Collapse
|
17
|
Nguyen T, Campbell A, Kumar A, Amponsah E, Fiterau M, Shahriyari L. Optimal fusion of genotype and drug embeddings in predicting cancer drug response. Brief Bioinform 2024; 25:bbae227. [PMID: 38754407 PMCID: PMC11097979 DOI: 10.1093/bib/bbae227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 04/14/2024] [Accepted: 04/25/2024] [Indexed: 05/18/2024] Open
Abstract
Predicting cancer drug response using both genomics and drug features has shown some success compared to using genomics features alone. However, there has been limited research done on how best to combine or fuse the two types of features. Using a visible neural network with two deep learning branches for genes and drug features as the base architecture, we experimented with different fusion functions and fusion points. Our experiments show that injecting multiplicative relationships between gene and drug latent features into the original concatenation-based architecture DrugCell significantly improved the overall predictive performance and outperformed other baseline models. We also show that different fusion methods respond differently to different fusion points, indicating that the relationship between drug features and different hierarchical biological level of gene features is optimally captured using different methods. Considering both predictive performance and runtime speed, tensor product partial is the best-performing fusion function to combine late-stage representations of drug and gene features to predict cancer drug response.
Collapse
Affiliation(s)
- Trang Nguyen
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Anthony Campbell
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Ankit Kumar
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Edwin Amponsah
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Madalina Fiterau
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Leili Shahriyari
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| |
Collapse
|
18
|
Lao C, Zheng P, Chen H, Liu Q, An F, Li Z. DeepAEG: a model for predicting cancer drug response based on data enhancement and edge-collaborative update strategies. BMC Bioinformatics 2024; 25:105. [PMID: 38461284 PMCID: PMC10925015 DOI: 10.1186/s12859-024-05723-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 02/27/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The prediction of cancer drug response is a challenging subject in modern personalized cancer therapy due to the uncertainty of drug efficacy and the heterogeneity of patients. It has been shown that the characteristics of the drug itself and the genomic characteristics of the patient can greatly influence the results of cancer drug response. Therefore, accurate, efficient, and comprehensive methods for drug feature extraction and genomics integration are crucial to improve the prediction accuracy. RESULTS Accurate prediction of cancer drug response is vital for guiding the design of anticancer drugs. In this study, we propose an end-to-end deep learning model named DeepAEG which is based on a complete-graph update mode to predict IC50. Specifically, we integrate an edge update mechanism on the basis of a hybrid graph convolutional network to comprehensively learn the potential high-dimensional representation of topological structures in drugs, including atomic characteristics and chemical bond information. Additionally, we present a novel approach for enhancing simplified molecular input line entry specification data by employing sequence recombination to eliminate the defect of single sequence representation of drug molecules. Our extensive experiments show that DeepAEG outperforms other existing methods across multiple evaluation parameters in multiple test sets. Furthermore, we identify several potential anticancer agents, including bortezomib, which has proven to be an effective clinical treatment option. Our results highlight the potential value of DeepAEG in guiding the design of specific cancer treatment regimens.
Collapse
Affiliation(s)
- Chuanqi Lao
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Pengfei Zheng
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China.
| | - Qiao Liu
- Department of Statistics, Stanford University, Stanford, Palo Alto, CA, 94305, USA
| | - Feng An
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Zhao Li
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| |
Collapse
|
19
|
Xuan P, Gu J, Cui H, Wang S, Toshiya N, Liu C, Zhang T. Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes. Bioinformatics 2024; 40:btae025. [PMID: 38269610 PMCID: PMC10868329 DOI: 10.1093/bioinformatics/btae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/26/2023] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION The human microbiome may impact the effectiveness of drugs by modulating their activities and toxicities. Predicting candidate microbes for drugs can facilitate the exploration of the therapeutic effects of drugs. Most recent methods concentrate on constructing of the prediction models based on graph reasoning. They fail to sufficiently exploit the topology and position information, the heterogeneity of multiple types of nodes and connections, and the long-distance correlations among nodes in microbe-drug heterogeneous graph. RESULTS We propose a new microbe-drug association prediction model, NGMDA, to encode the position and topological features of microbe (drug) nodes, and fuse the different types of features from neighbors and the whole heterogeneous graph. First, we formulate the position and topology features of microbe (drug) nodes by t-step random walks, and the features reveal the topological neighborhoods at multiple scales and the position of each node. Second, as the features of nodes are high-dimensional and sparse, we designed an embedding enhancement strategy based on supervised fully connected autoencoders to form the embeddings with representative features and the more discriminative node distributions. Third, we propose an adaptive neighbor feature fusion module, which fuses features of neighbors by the constructed position- and topology-sensitive heterogeneous graph neural networks. A novel self-attention mechanism is developed to estimate the importance of the position and topology of each neighbor to a target node. Finally, a heterogeneous graph feature fusion module is constructed to learn the long-distance correlations among the nodes in the whole heterogeneous graph by a relationship-aware graph transformer. Relationship-aware graph transformer contains the strategy for encoding the connection relationship types among the nodes, which is helpful for integrating the diverse semantics of these connections. The extensive comparison experimental results demonstrate NGMDA's superior performance over five state-of-the-art prediction methods. The ablation experiment shows the contributions of the multi-scale topology and position feature learning, the embedding enhancement strategy, the neighbor feature fusion, and the heterogeneous graph feature fusion. Case studies over three drugs further indicate that NGMDA has ability in discovering the potential drug-related microbes. AVAILABILITY AND IMPLEMENTATION Source codes and Supplementary Material are available at https://github.com/pingxuan-hlju/NGMDA.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Jing Gu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Nakaguchi Toshiya
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Cheng Liu
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
20
|
Lin CX, Guan Y, Li HD. Artificial intelligence approaches for molecular representation in drug response prediction. Curr Opin Struct Biol 2024; 84:102747. [PMID: 38091924 DOI: 10.1016/j.sbi.2023.102747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/26/2023] [Accepted: 11/26/2023] [Indexed: 02/09/2024]
Abstract
Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan Province, PR China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, PR China.
| |
Collapse
|
21
|
Liu H, Peng W, Dai W, Lin J, Fu X, Liu L, Liu L, Yu N. Improving anti-cancer drug response prediction using multi-task learning on graph convolutional networks. Methods 2024; 222:41-50. [PMID: 38157919 DOI: 10.1016/j.ymeth.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/19/2023] [Accepted: 11/19/2023] [Indexed: 01/03/2024] Open
Abstract
Predicting the therapeutic effect of anti-cancer drugs on tumors based on the characteristics of tumors and patients is one of the important contents of precision oncology. Existing computational methods regard the drug response prediction problem as a classification or regression task. However, few of them consider leveraging the relationship between the two tasks. In this work, we propose a Multi-task Interaction Graph Convolutional Network (MTIGCN) for anti-cancer drug response prediction. MTIGCN first utilizes an graph convolutional network-based model to produce embeddings for both cell lines and drugs. After that, the model employs multi-task learning to predict anti-cancer drug response, which involves training the model on three different tasks simultaneously: the main task of the drug sensitive or resistant classification task and the two auxiliary tasks of regression prediction and similarity network reconstruction. By sharing parameters and optimizing the losses of different tasks simultaneously, MTIGCN enhances the feature representation and reduces overfitting. The results of the experiments on two in vitro datasets demonstrated that MTIGCN outperformed seven state-of-the-art baseline methods. Moreover, the well-trained model on the in vitro dataset GDSC exhibited good performance when applied to predict drug responses in in vivo datasets PDX and TCGA. The case study confirmed the model's ability to discover unknown drug responses in cell lines.
Collapse
Affiliation(s)
- Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Jiangzhen Lin
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport NY 14422.
| |
Collapse
|
22
|
Baek B, Jang E, Park S, Park SH, Williams DR, Jung DW, Lee H. Integrated drug response prediction models pinpoint repurposed drugs with effectiveness against rhabdomyosarcoma. PLoS One 2024; 19:e0295629. [PMID: 38277404 PMCID: PMC10817174 DOI: 10.1371/journal.pone.0295629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/24/2023] [Indexed: 01/28/2024] Open
Abstract
Targeted therapies for inhibiting the growth of cancer cells or inducing apoptosis are urgently needed for effective rhabdomyosarcoma (RMS) treatment. However, identifying cancer-targeting compounds with few side effects, among the many potential compounds, is expensive and time-consuming. A computational approach to reduce the number of potential candidate drugs can facilitate the discovery of attractive lead compounds. To address this and obtain reliable predictions of novel cell-line-specific drugs, we apply prediction models that have the potential to improve drug discovery approaches for RMS treatment. The results of two prediction models were ensemble and validated via in vitro experiments. The computational models were trained using data extracted from the Genomics of Drug Sensitivity in Cancer database and tested on two RMS cell lines to select potential RMS drug candidates. Among 235 candidate drugs, 22 were selected following the result of the computational approach, and three candidate drugs were identified (NSC207895, vorinostat, and belinostat) that showed selective effectiveness in RMS cell lines in vitro via the induction of apoptosis. Our in vitro experiments have demonstrated that our proposed methods can effectively identify and repurpose drugs for treating RMS.
Collapse
Affiliation(s)
- Bin Baek
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Eunmi Jang
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Sejin Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Sung-Hye Park
- Department of Pathology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Neuroscience, Seoul National University Hospital, Seoul, Republic of Korea
| | - Darren Reece Williams
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Da-Woon Jung
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| |
Collapse
|
23
|
Wang Y, Yu X, Gu Y, Li W, Zhu K, Chen L, Tang Y, Liu G. XGraphCDS: An explainable deep learning model for predicting drug sensitivity from gene pathways and chemical structures. Comput Biol Med 2024; 168:107746. [PMID: 38039896 DOI: 10.1016/j.compbiomed.2023.107746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
Cancer is a highly complex disease characterized by genetic and phenotypic heterogeneity among individuals. In the era of precision medicine, understanding the genetic basis of these individual differences is crucial for developing new drugs and achieving personalized treatment. Despite the increasing abundance of cancer genomics data, predicting the relationship between cancer samples and drug sensitivity remains challenging. In this study, we developed an explainable graph neural network framework for predicting cancer drug sensitivity (XGraphCDS) based on comparative learning by integrating cancer gene expression information and drug chemical structure knowledge. Specifically, XGraphCDS consists of a unified heterogeneous network and multiple sub-networks, with molecular graphs representing drugs and gene enrichment scores representing cell lines. Experimental results showed that XGraphCDS consistently outperformed most state-of-the-art baselines (R2 = 0.863, AUC = 0.858). We also constructed a separate in vivo prediction model by using transfer learning strategies with in vitro experimental data and achieved good predictive power (AUC = 0.808). Simultaneously, our framework is interpretable, providing insights into resistance mechanisms alongside accurate predictions. The excellent performance of XGraphCDS highlights its immense potential in aiding the development of selective anti-tumor drugs and personalized dosing strategies in the field of precision medicine.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
24
|
Zhu W, Zhang L, Jiang X, Zhou P, Xie X, Wang H. A method combining LDA and neural networks for antitumor drug efficacy prediction. Digit Health 2024; 10:20552076241280103. [PMID: 39257869 PMCID: PMC11384538 DOI: 10.1177/20552076241280103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 08/09/2024] [Indexed: 09/12/2024] Open
Abstract
Background Personalized medicine has gained more attention for cancer precision treatment due to patient genetic heterogeneity in recent years. However, predicting the efficacy of antitumor drugs in advance remains a significant challenge to achieve this task. Objective This study aims to predict the efficacy of antitumor drugs in individual cancer patients based on clinical data. Methods This paper proposes to predict personalized antitumor drug efficacy based on clinical data. Specifically, we encode the clinical text of cancer patients as a probability distribution vector in hidden topics space using the Latent Dirichlet Allocation (LDA) model, named LDA representation. Then, a neural network is designed, and the LDA representation is input into the neural network to predict drug response in cancer patients treated with platinum drugs. To evaluate the effectiveness of the proposed method, we gathered and organized clinical records of lung and bowel cancer patients who underwent platinum-based treatment. The prediction performance is assessed using the following metrics: Precision, Recall, F1-score, Accuracy, and Area Under the ROC Curve (AUC). Results The study analyzed a dataset of 958 patients with non-small cell cancer treated with antitumor drugs. The proposed method achieved a stratified 5-fold cross-validation average Precision of 0.81, Recall of 0.89, F1-score of 0.85, Accuracy of 0.77, and AUC of 0.81 for cisplatin efficacy prediction on the data, which most are better than those of previous methods. Of these, the AUC value is at least 4% higher than those of the previous. At the same time, the superior result over the previous method persisted on an independent dataset of 266 bowel cancer patients, showing the generalizability of the proposed method. These results demonstrate the potential value of precise tumor treatment in clinical practice. Conclusions Combining LDA and neural networks can help predict the efficacy of antitumor drugs based on clinical text. Our approach outperforms previous methods in predicting drug clinical efficacy.
Collapse
Affiliation(s)
- Weiwei Zhu
- University of Science and Technology of China, Hefei, Anhui, China
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China
| | - Lei Zhang
- Department of Pharmacy, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Xiaodong Jiang
- Medical Oncology Department, The First Affiliated Hospital of University of Science and Technology of China, Hefei, Anhui, China
| | - Peng Zhou
- School of Life Science, Hefei Normal University, Hefei, Anhui, China
| | - Xinping Xie
- School of Mathematics and Physics, Anhui Jianzhu University, Hefei, Anhui, China
| | - Hongqiang Wang
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China
| |
Collapse
|
25
|
Park S, Lee H. Molecular data representation based on gene embeddings for cancer drug response prediction. Sci Rep 2023; 13:21898. [PMID: 38081928 PMCID: PMC10713675 DOI: 10.1038/s41598-023-49003-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 12/02/2023] [Indexed: 12/18/2023] Open
Abstract
Cancer drug response prediction is a crucial task in precision medicine, but existing models have limitations in effectively representing molecular profiles of cancer cells. Specifically, when these models represent molecular omics data such as gene expression, they employ a one-hot encoding-based approach, where a fixed gene set is selected for all samples and omics data values are assigned to specific positions in a vector. However, this approach restricts the utilization of embedding-vector-based methods, such as attention-based models, and limits the flexibility of gene selection. To address these issues, our study proposes gene embedding-based fully connected neural networks (GEN) that utilizes gene embedding vectors as input data for cancer drug response prediction. The GEN allows for the use of embedding-vector-based architectures and different gene sets for each sample, providing enhanced flexibility. To validate the efficacy of GEN, we conducted experiments on three cancer drug response datasets. Our results demonstrate that GEN outperforms other recently developed methods in cancer drug prediction tasks and offers improved gene representation capabilities. All source codes are available at https://github.com/DMCB-GIST/GEN/ .
Collapse
Affiliation(s)
- Sejin Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
26
|
Li Y, Guo Z, Gao X, Wang G. MMCL-CDR: enhancing cancer drug response prediction with multi-omics and morphology images contrastive representation learning. Bioinformatics 2023; 39:btad734. [PMID: 38070154 PMCID: PMC10756335 DOI: 10.1093/bioinformatics/btad734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 11/09/2023] [Indexed: 12/30/2023] Open
Abstract
MOTIVATION Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning models have been developed to predict drug responses based on tumor and drug characteristics. However, current studies either focus on constructing homogeneous networks from single data source or heterogeneous networks from multiomics data. While multiomics data have shown potential in predicting drug responses in cancer cell lines, there is still a lack of research that effectively utilizes insights from different modalities. Furthermore, effectively utilizing the multimodal knowledge of cancer cell lines poses a challenge due to the heterogeneity inherent in these modalities. RESULTS To address these challenges, we introduce MMCL-CDR (Multimodal Contrastive Learning for Cancer Drug Responses), a multimodal approach for cancer drug response prediction that integrates copy number variation, gene expression, morphology images of cell lines, and chemical structure of drugs. The objective of MMCL-CDR is to align cancer cell lines across different data modalities by learning cell line representations from omic and image data, and combined with structural drug representations to enhance the prediction of cancer drug responses (CDR). We have carried out comprehensive experiments and show that our model significantly outperforms other state-of-the-art methods in CDR prediction. The experimental results also prove that the model can learn more accurate cell line representation by integrating multiomics and morphological data from cell lines, thereby improving the accuracy of CDR prediction. In addition, the ablation study and qualitative analysis also confirm the effectiveness of each part of our proposed model. Last but not least, MMCL-CDR opens up a new dimension for cancer drug response prediction through multimodal contrastive learning, pioneering a novel approach that integrates multiomics and multimodal drug and cell line modeling. AVAILABILITY AND IMPLEMENTATION MMCL-CDR is available at https://github.com/catly/MMCL-CDR.
Collapse
Affiliation(s)
- Yang Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150006, China
| | - Zihou Guo
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150006, China
| | - Xin Gao
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150006, China
| |
Collapse
|
27
|
Zhao BW, Su XR, Yang Y, Li DX, Li GD, Hu PW, Zhao YG, Hu L. Drug-disease association prediction using semantic graph and function similarity representation learning over heterogeneous information networks. Methods 2023; 220:106-114. [PMID: 37972913 DOI: 10.1016/j.ymeth.2023.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/13/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Discovering new indications for existing drugs is a promising development strategy at various stages of drug research and development. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering available higher-order connectivity patterns in heterogeneous biological information networks, which are believed to be useful for improving the accuracy of new drug discovering. To this end, we propose a computational-based model, called SFRLDDA, for drug-disease association prediction by using semantic graph and function similarity representation learning. Specifically, SFRLDDA first integrates a heterogeneous information network (HIN) by drug-disease, drug-protein, protein-disease associations, and their biological knowledge. Second, different representation learning strategies are applied to obtain the feature representations of drugs and diseases from different perspectives over semantic graph and function similarity graphs constructed, respectively. At last, a Random Forest classifier is incorporated by SFRLDDA to discover potential drug-disease associations (DDAs). Experimental results demonstrate that SFRLDDA yields a best performance when compared with other state-of-the-art models on three benchmark datasets. Moreover, case studies also indicate that the simultaneous consideration of semantic graph and function similarity of drugs and diseases in the HIN allows SFRLDDA to precisely predict DDAs in a more comprehensive manner.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Yue Yang
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Dong-Xu Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Guo-Dong Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Peng-Wei Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Yong-Gang Zhao
- Department of Orthopaedic Surgery (hand and foot trauma), People's Hospital of Dongxihu, Wuhan 420100, China.
| | - Lun Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| |
Collapse
|
28
|
Peng W, Yu P, Dai W, Fu X, Liu L, Pan Y. A Graph Convolution Network-Based Model for Prioritizing Personalized Cancer Driver Genes of Individual Patients. IEEE Trans Nanobioscience 2023; 22:744-754. [PMID: 37195839 DOI: 10.1109/tnb.2023.3277316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Cancer driver genes are mutated genes that play a key role in the growth of cancer cells. Accurately identifying the cancer driver genes helps us understand cancer's pathogenesis and develop effective treatment strategies. However, cancers are highly heterogeneous diseases; patients with the same cancer type may have different genomic characteristics and clinical symptoms. Hence, it is urgent to devise effective methods to identify personalized cancer driver genes of individual patients to help determine whether a patient can be treated with a certain targeted drug. This work presents a method for predicting personalized cancer Driver genes of individual patients based on Graph Convolution Networks and Neighbor Interactions called NIGCNDriver. NIGCNDriver first constructs a gene-sample association matrix using the associations between a sample and its known driver genes. Then, it employs graph convolution models on the gene-sample network to aggregate neighbor node features, and themself features, and then combines with the element-wise level interactions between neighbors to learn new feature representations for the samples and gene nodes. Finally, a linear correlation coefficient decoder is used to reconstruct the association between the sample and the mutant gene, enabling the prediction of a personalized driver gene for the individual sample. We applied the NIGCNDriver method to predict cancer driver genes for individual samples in the TCGA and cancer cell line datasets. The results show that our method outperforms the baseline methods in cancer driver gene prediction for individual samples.
Collapse
|
29
|
Wang Q, He M, Guo L, Chai H. AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration. Brief Bioinform 2023; 24:bbad269. [PMID: 37497720 DOI: 10.1093/bib/bbad269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Vertical federated learning has gained popularity as a means of enabling collaboration and information sharing between different entities while maintaining data privacy and security. This approach has potential applications in disease healthcare, cancer prognosis prediction, and other industries where data privacy is a major concern. Although using multi-omics data for cancer prognosis prediction provides more information for treatment selection, collecting different types of omics data can be challenging due to their production in various medical institutions. Data owners must comply with strict data protection regulations such as European Union (EU) General Data Protection Regulation. To share patient data across multiple institutions, privacy and security issues must be addressed. Therefore, we propose an adaptive optimized vertical federated-learning-based framework adaptive optimized vertical federated learning for heterogeneous multi-omics data integration (AFEI) to integrate multi-omics data collected from multiple institutions for cancer prognosis prediction. AFEI enables participating parties to build an accurate joint evaluation model for learning more information related to cancer patients from different perspectives, based on the distributed and encrypted multi-omics features shared by multiple institutions. The experimental results demonstrate that AFEI achieves higher prediction accuracy (6.5% on average) than using single omics data by utilizing the encrypted multi-omics data from different institutions, and it performs almost as well as prognosis prediction by directly integrating multi-omics data. Overall, AFEI can be seen as an efficient solution for breaking down barriers to multi-institutional collaboration and promoting the development of cancer prognosis prediction.
Collapse
Affiliation(s)
- Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei 230000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| | - Longyi Guo
- Guangdong Provincial Hospital of Traditional Chinese Medical, Guangzhou 510000, China
| | - Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| |
Collapse
|
30
|
Xiao S, Lin H, Wang C, Wang S, Rajapakse JC. Graph Neural Networks With Multiple Prior Knowledge for Multi-Omics Data Analysis. IEEE J Biomed Health Inform 2023; 27:4591-4600. [PMID: 37307177 DOI: 10.1109/jbhi.2023.3284794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
With the development of biotechnology, a large amount of multi-omics data have been collected for precision medicine. There exists multiple graph-based prior biological knowledge about omics data, such as gene-gene interaction networks. Recently, there has been an increasing interest in introducing graph neural networks (GNNs) into multi-omics learning. However, existing methods have not fully exploited these graphical priors since none have been able to integrate knowledge from multiple sources simultaneously. To solve this problem, we propose a multi-omics data analysis framework by incorporating multiple prior knowledge into graph neural network (MPK-GNN). To the best of our knowledge, this is the first attempt to introduce multiple prior graphs into multi-omics data analysis. Specifically, the proposed method contains four parts: (1) a feature-level learning module to aggregate information from prior graphs; (2) a projection module to maximize the agreement among prior networks by optimizing a contrastive loss; (3) a sample-level module to learn a global representation from input multi-omics features; (4) a task-specific module to flexibly extend MPK-GNN for various downstream multi-omics analysis tasks. Finally, we verify the effectiveness of the proposed multi-omics learning algorithm on the cancer molecular subtype classification task. Experimental results show that MPK-GNN outperforms other state-of-the-art algorithms, including multi-view learning methods and multi-omics integrative approaches.
Collapse
|
31
|
Wu P, Sun R, Fahira A, Chen Y, Jiangzhou H, Wang K, Yang Q, Dai Y, Pan D, Shi Y, Wang Z. DROEG: a method for cancer drug response prediction based on omics and essential genes integration. Brief Bioinform 2023; 24:7008798. [PMID: 36715269 DOI: 10.1093/bib/bbad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 12/06/2022] [Accepted: 12/30/2022] [Indexed: 01/31/2023] Open
Abstract
Predicting therapeutic responses in cancer patients is a major challenge in the field of precision medicine due to high inter- and intra-tumor heterogeneity. Most drug response models need to be improved in terms of accuracy, and there is limited research to assess therapeutic responses of particular tumor types. Here, we developed a novel method DROEG (Drug Response based on Omics and Essential Genes) for prediction of drug response in tumor cell lines by integrating genomic, transcriptomic and methylomic data along with CRISPR essential genes, and revealed that the incorporation of tumor proliferation essential genes can improve drug sensitivity prediction. Concisely, DROEG integrates literature-based and statistics-based methods to select features and uses Support Vector Regression for model construction. We demonstrate that DROEG outperforms most state-of-the-art algorithms by both qualitative (prediction accuracy for drug-sensitive/resistant) and quantitative (Pearson correlation coefficient between the predicted and actual IC50) evaluation in Genomics of Drug Sensitivity in Cancer and Cancer Cell Line Encyclopedia datasets. In addition, DROEG is further applied to the pan-gastrointestinal tumor with high prevalence and mortality as a case study at both cell line and clinical levels to evaluate the model efficacy and discover potential prognostic biomarkers in Cisplatin and Epirubicin treatment. Interestingly, the CRISPR essential gene information is found to be the most important contributor to enhance the accuracy of the DROEG model. To our knowledge, this is the first study to integrate essential genes with multi-omics data to improve cancer drug response prediction and provide insights into personalized precision treatment.
Collapse
Affiliation(s)
- Peike Wu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Renliang Sun
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Aamir Fahira
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Yongzhou Chen
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Huiting Jiangzhou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Ke Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Qiangzhen Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Dai
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Dun Pan
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| | - Zhuo Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai, China
- Collaborative Innovation Centre for Brain Science, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
32
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
33
|
Feldner-Busztin D, Firbas Nisantzis P, Edmunds SJ, Boza G, Racimo F, Gopalakrishnan S, Limborg MT, Lahti L, de Polavieja GG. Dealing with dimensionality: the application of machine learning to multi-omics data. Bioinformatics 2023; 39:6986971. [PMID: 36637211 PMCID: PMC9907220 DOI: 10.1093/bioinformatics/btad021] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 12/02/2022] [Accepted: 01/11/2023] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Machine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets. RESULTS Our main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments. AVAILABILITY AND IMPLEMENTATION All data and processing scripts are available at this GitLab repository: https://gitlab.com/polavieja_lab/ml_multi-omics_review/ or in Zenodo: https://doi.org/10.5281/zenodo.7361807. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dylan Feldner-Busztin
- Champalimaud Centre for the Unknown, Champalimaud Foundation, 1400-038 Lisbon, Portugal
| | | | - Shelley Jane Edmunds
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Gergely Boza
- Centre for Ecological Research, 1113 Budapest, Hungary
| | - Fernando Racimo
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Shyam Gopalakrishnan
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Morten Tønsberg Limborg
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Leo Lahti
- Department of Computing, University of Turku, 20014 Turku, Finland
| | | |
Collapse
|
34
|
Wang H, Dai C, Wen Y, Wang X, Liu W, He S, Bo X, Peng S. GADRP: graph convolutional networks and autoencoders for cancer drug response prediction. Brief Bioinform 2023; 24:6865039. [PMID: 36460622 DOI: 10.1093/bib/bbac501] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 12/04/2022] Open
Abstract
Drug response prediction in cancer cell lines is of great significance in personalized medicine. In this study, we propose GADRP, a cancer drug response prediction model based on graph convolutional networks (GCNs) and autoencoders (AEs). We first use a stacked deep AE to extract low-dimensional representations from cell line features, and then construct a sparse drug cell line pair (DCP) network incorporating drug, cell line, and DCP similarity information. Later, initial residual and layer attention-based GCN (ILGCN) that can alleviate over-smoothing problem is utilized to learn DCP features. And finally, fully connected network is employed to make prediction. Benchmarking results demonstrate that GADRP can significantly improve prediction performance on all metrics compared with baselines on five datasets. Particularly, experiments of predictions of unknown DCP responses, drug-cancer tissue associations, and drug-pathway associations illustrate the predictive power of GADRP. All results highlight the effectiveness of GADRP in predicting drug responses, and its potential value in guiding anti-cancer drug selection.
Collapse
Affiliation(s)
- Hong Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.,Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Wenjuan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Song He
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha 410082, China
| |
Collapse
|
35
|
Peng W, Wu R, Dai W, Yu N. Identifying cancer driver genes based on multi-view heterogeneous graph convolutional network and self-attention mechanism. BMC Bioinformatics 2023; 24:16. [PMID: 36639646 PMCID: PMC9838012 DOI: 10.1186/s12859-023-05140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/06/2023] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Correctly identifying the driver genes that promote cell growth can significantly assist drug design, cancer diagnosis and treatment. The recent large-scale cancer genomics projects have revealed multi-omics data from thousands of cancer patients, which requires to design effective models to unlock the hidden knowledge within the valuable data and discover cancer drivers contributing to tumorigenesis. RESULTS In this work, we propose a graph convolution network-based method called MRNGCN that integrates multiple gene relationship networks to identify cancer driver genes. First, we constructed three gene relationship networks, including the gene-gene, gene-outlying gene and gene-miRNA networks. Then, genes learnt feature presentations from the three networks through three sharing-parameter heterogeneous graph convolution network (HGCN) models with the self-attention mechanism. After that, these gene features pass a convolution layer to generate fused features. Finally, we utilized the fused features and the original feature to optimize the model by minimizing the node and link prediction losses. Meanwhile, we combined the fused features, the original features and the three features learned from every network through a logistic regression model to predict cancer driver genes. CONCLUSIONS We applied the MRNGCN to predict pan-cancer and cancer type-specific driver genes. Experimental results show that our model performs well in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPRC) compared to state-of-the-art methods. Ablation experimental results show that our model successfully improved the cancer driver identification by integrating multiple gene relationship networks.
Collapse
Affiliation(s)
- Wei Peng
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China ,grid.218292.20000 0000 8571 108XComputer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Rong Wu
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Dai
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China ,grid.218292.20000 0000 8571 108XComputer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Ning Yu
- grid.264262.60000 0001 0725 9953Department of Computing Sciences, The College at Brockport, State University of New York, 350 New Campus Drive, Brockport, NY 14422 USA
| |
Collapse
|
36
|
Rodríguez Ruiz N, Abd Own S, Ekström Smedby K, Eloranta S, Koch S, Wästerlid T, Krstic A, Boman M. Data-driven support to decision-making in molecular tumour boards for lymphoma: A design science approach. Front Oncol 2022; 12:984021. [PMID: 36457495 PMCID: PMC9705761 DOI: 10.3389/fonc.2022.984021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/03/2022] [Indexed: 09/10/2024] Open
Abstract
Background The increasing amount of molecular data and knowledge about genomic alterations from next-generation sequencing processes together allow for a greater understanding of individual patients, thereby advancing precision medicine. Molecular tumour boards feature multidisciplinary teams of clinical experts who meet to discuss complex individual cancer cases. Preparing the meetings is a manual and time-consuming process. Purpose To design a clinical decision support system to improve the multimodal data interpretation in molecular tumour board meetings for lymphoma patients at Karolinska University Hospital, Stockholm, Sweden. We investigated user needs and system requirements, explored the employment of artificial intelligence, and evaluated the proposed design with primary stakeholders. Methods Design science methodology was used to form and evaluate the proposed artefact. Requirements elicitation was done through a scoping review followed by five semi-structured interviews. We used UML Use Case diagrams to model user interaction and UML Activity diagrams to inform the proposed flow of control in the system. Additionally, we modelled the current and future workflow for MTB meetings and its proposed machine learning pipeline. Interactive sessions with end-users validated the initial requirements based on a fictive patient scenario which helped further refine the system. Results The analysis showed that an interactive secure Web-based information system supporting the preparation of the meeting, multidisciplinary discussions, and clinical decision-making could address the identified requirements. Integrating artificial intelligence via continual learning and multimodal data fusion were identified as crucial elements that could provide accurate diagnosis and treatment recommendations. Impact Our work is of methodological importance in that using artificial intelligence for molecular tumour boards is novel. We provide a consolidated proof-of-concept system that could support the end-to-end clinical decision-making process and positively and immediately impact patients. Conclusion Augmenting a digital decision support system for molecular tumour boards with retrospective patient material is promising. This generates realistic and constructive material for human learning, and also digital data for continual learning by data-driven artificial intelligence approaches. The latter makes the future system adaptable to human bias, improving adequacy and decision quality over time and over tasks, while building and maintaining a digital log.
Collapse
Affiliation(s)
- Núria Rodríguez Ruiz
- Department of Learning, Informatics, Management and Ethics (LIME), Health Informatics Centre, Karolinska Institutet, Stockholm, Sweden
| | - Sulaf Abd Own
- Department of Medicine Solna, Clinical Epidemiology Division, Karolinska Institutet, Stockholm, Sweden
- Department of Laboratory Medicine, Division of Pathology, Karolinska University Hospital Huddinge, Stockholm, Sweden
| | - Karin Ekström Smedby
- Department of Medicine Solna, Clinical Epidemiology Division, Karolinska Institutet, Stockholm, Sweden
- Department of Hematology, Karolinska University Hospital, Stockholm, Sweden
| | - Sandra Eloranta
- Department of Medicine Solna, Clinical Epidemiology Division, Karolinska Institutet, Stockholm, Sweden
| | - Sabine Koch
- Department of Learning, Informatics, Management and Ethics (LIME), Health Informatics Centre, Karolinska Institutet, Stockholm, Sweden
| | - Tove Wästerlid
- Department of Medicine Solna, Clinical Epidemiology Division, Karolinska Institutet, Stockholm, Sweden
- Department of Hematology, Karolinska University Hospital, Stockholm, Sweden
| | - Aleksandra Krstic
- Center for Hematology and Regenerative Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Magnus Boman
- Department of Learning, Informatics, Management and Ethics (LIME), Health Informatics Centre, Karolinska Institutet, Stockholm, Sweden
- School of Electrical Engineering and Computer Science (EECS)/Software and Computer Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
37
|
Peng W, Liu H, Dai W, Yu N, Wang J. Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions. Bioinformatics 2022; 38:4546-4553. [PMID: 35997568 DOI: 10.1093/bioinformatics/btac574] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/26/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Due to cancer heterogeneity, the therapeutic effect may not be the same when a cohort of patients of the same cancer type receive the same treatment. The anticancer drug response prediction may help develop personalized therapy regimens to increase survival and reduce patients' expenses. Recently, graph neural network-based methods have aroused widespread interest and achieved impressive results on the drug response prediction task. However, most of them apply graph convolution to process cell line-drug bipartite graphs while ignoring the intrinsic differences between cell lines and drug nodes. Moreover, most of these methods aggregate node-wise neighbor features but fail to consider the element-wise interaction between cell lines and drugs. RESULTS This work proposes a neighborhood interaction (NI)-based heterogeneous graph convolution network method, namely NIHGCN, for anticancer drug response prediction in an end-to-end way. Firstly, it constructs a heterogeneous network consisting of drugs, cell lines and the known drug response information. Cell line gene expression and drug molecular fingerprints are linearly transformed and input as node attributes into an interaction model. The interaction module consists of a parallel graph convolution network layer and a NI layer, which aggregates node-level features from their neighbors through graph convolution operation and considers the element-level of interactions with their neighbors in the NI layer. Finally, the drug response predictions are made by calculating the linear correlation coefficients of feature representations of cell lines and drugs. We have conducted extensive experiments to assess the effectiveness of our model on Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. It has achieved the best performance compared with the state-of-the-art algorithms, especially in predicting drug responses for new cell lines, new drugs and targeted drugs. Furthermore, our model that was well trained on the GDSC dataset can be successfully applied to predict samples of PDX and TCGA, which verified the transferability of our model from cell line in vitro to the datasets in vivo. AVAILABILITY AND IMPLEMENTATION The source code can be obtained from https://github.com/weiba/NIHGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Ning Yu
- Department of Computing Sciences, The College at Brockport, State University of New York, Brockport, NY 14422, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, P. R. China
| |
Collapse
|
38
|
Zhang S, Ren Y, Wang J, Song B, Li R, Xu Y. GSTCNet: Gated spatio-temporal correlation network for stroke mortality prediction. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:9966-9982. [PMID: 36031978 DOI: 10.3934/mbe.2022465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Stroke continues to be the most common cause of death in China. It has great significance for mortality prediction for stroke patients, especially in terms of analyzing the complex interactions between non-negligible factors. In this paper, we present a gated spatio-temporal correlation network (GSTCNet) to predict the one-year post-stroke mortality. Based on the four categories of risk factors: vascular event, chronic disease, medical usage and surgery, we designed a gated correlation graph convolution kernel to capture spatial features and enhance the spatial correlation between feature categories. Bi-LSTM represents the temporal features of five timestamps. The novel gated correlation attention mechanism is then connected to the Bi-LSTM to realize the comprehensive mining of spatio-temporal correlations. Using the data on 2275 patients obtained from the neurology department of a local hospital, we constructed a series of sequential experiments. The experimental results show that the proposed model achieves competitive results on each evaluation metric, reaching an AUC of 89.17%, a precision of 97.75%, a recall of 95.33% and an F1-score of 95.19%. The interpretability analysis of the feature categories and timestamps also verified the potential application value of the model for stroke.
Collapse
Affiliation(s)
- Shuo Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Yonghao Ren
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Jing Wang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Bo Song
- Department of Neurology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450000, China
- NHC Key Laboratory of Prevention and Treatment of Cerebrovascular Diseases, Zhengzhou 450000, China
| | - Runzhi Li
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Yuming Xu
- Department of Neurology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450000, China
- NHC Key Laboratory of Prevention and Treatment of Cerebrovascular Diseases, Zhengzhou 450000, China
| |
Collapse
|
39
|
Gu Y, Zheng S, Xu Z, Yin Q, Li L, Li J. An efficient curriculum learning-based strategy for molecular graph learning. Brief Bioinform 2022; 23:6562682. [DOI: 10.1093/bib/bbac099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/18/2022] [Accepted: 02/27/2022] [Indexed: 12/14/2022] Open
Abstract
Abstract
Computational methods have been widely applied to resolve various core issues in drug discovery, such as molecular property prediction. In recent years, a data-driven computational method-deep learning had achieved a number of impressive successes in various domains. In drug discovery, graph neural networks (GNNs) take molecular graph data as input and learn graph-level representations in non-Euclidean space. An enormous amount of well-performed GNNs have been proposed for molecular graph learning. Meanwhile, efficient use of molecular data during training process, however, has not been paid enough attention. Curriculum learning (CL) is proposed as a training strategy by rearranging training queue based on calculated samples' difficulties, yet the effectiveness of CL method has not been determined in molecular graph learning. In this study, inspired by chemical domain knowledge and task prior information, we proposed a novel CL-based training strategy to improve the training efficiency of molecular graph learning, called CurrMG. Consisting of a difficulty measurer and a training scheduler, CurrMG is designed as a plug-and-play module, which is model-independent and easy-to-use on molecular data. Extensive experiments demonstrated that molecular graph learning models could benefit from CurrMG and gain noticeable improvement on five GNN models and eight molecular property prediction tasks (overall improvement is 4.08%). We further observed CurrMG’s encouraging potential in resource-constrained molecular property prediction. These results indicate that CurrMG can be used as a reliable and efficient training strategy for molecular graph learning.
Availability: The source code is available in https://github.com/gu-yaowen/CurrMG.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Zidu Xu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Liang Li
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| |
Collapse
|
40
|
Dai W, Yue W, Peng W, Fu X, Liu L, Liu L. Identifying Cancer Subtypes Using a Residual Graph Convolution Model on a Sample Similarity Network. Genes (Basel) 2021; 13:genes13010065. [PMID: 35052405 PMCID: PMC8774659 DOI: 10.3390/genes13010065] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/23/2021] [Accepted: 12/24/2021] [Indexed: 11/16/2022] Open
Abstract
Cancer subtype classification helps us to understand the pathogenesis of cancer and develop new cancer drugs, treatment from which patients would benefit most. Most previous studies detect cancer subtypes by extracting features from individual samples, ignoring their associations with others. We believe that the interactions of cancer samples can help identify cancer subtypes. This work proposes a cancer subtype classification method based on a residual graph convolutional network and a sample similarity network. First, we constructed a sample similarity network regarding cancer gene co-expression patterns. Then, the gene expression profiles of cancer samples as initial features and the sample similarity network were passed into a two-layer graph convolutional network (GCN) model. We introduced the initial features to the GCN model to avoid over-smoothing during the training process. Finally, the classification of cancer subtypes was obtained through a softmax activation function. Our model was applied to breast invasive carcinoma (BRCA), glioblastoma multiforme (GBM) and lung cancer (LUNG) datasets. The accuracy values of our model reached 82.58%, 85.13% and 79.18% for BRCA, GBM and LUNG, respectively, which outperformed the existing methods. The survival analysis of our results proves the significant clinical features of the cancer subtypes identified by our model. Moreover, we can leverage our model to detect the essential genes enriched in gene ontology (GO) terms and the biological pathways related to a cancer subtype.
Collapse
Affiliation(s)
- Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
| | - Wenhao Yue
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
- Correspondence: ; Tel.: +86-13700600056
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; (W.D.); (W.Y.); (X.F.); (L.L.); (L.L.)
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| |
Collapse
|