1
|
Inoue Y, Song T, Wang X, Luna A, Fu T. DruGagent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction. ARXIV 2025:arXiv:2408.13378v4. [PMID: 40297237 PMCID: PMC12036430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Advancements in large language models (LLMs) allow them to address diverse questions using human-like interfaces. Still, limitations in their training prevent them from answering accurately in scenarios that could benefit from multiple perspectives. Multi-agent systems allow the resolution of questions to enhance result consistency and reliability. While drug-target interaction (DTI) prediction is important for drug discovery, existing approaches face challenges due to complex biological systems and the lack of interpretability needed for clinical applications. DrugAgent is a multi-agent LLM system for DTI prediction that combines multiple specialized perspectives with transparent reasoning. Our system adapts and extends existing multi-agent frameworks by (1) applying coordinator-based architecture to the DTI domain, (2) integrating domain-specific data sources, including ML predictions, knowledge graphs, and literature evidence, and (3) incorporating Chain-of-Thought (CoT) and ReAct (Reason+Act) frameworks for transparent DTI reasoning. We conducted comprehensive experiments using a kinase inhibitor dataset, where our multi-agent LLM method outperformed the non-reasoning multi-agent model (GPT-4o mini) by 45% in F1 score (0.514 vs 0.355). Through ablation studies, we demonstrated the contributions of each agent, with the AI agent being the most impactful, followed by the KG agent and search agent. Most importantly, our approach provides detailed, human-interpretable reasoning for each prediction by combining evidence from multiple sources - a critical feature for biomedical applications where understanding the rationale behind predictions is essential for clinical decision-making and regulatory compliance. Code is available at https://anonymous.4open.science/r/DrugAgent-B2EA.
Collapse
Affiliation(s)
- Yoshitaka Inoue
- Dept of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
- Computational Biology Branch, National Library of Medicine, Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, USA
| | - Tianci Song
- Dept of Computer Science and Engineering, University of Minnesota Minneapolis, MN, USA
| | - Xinling Wang
- Khoury College of Computer Sciences, Northeastern University Arlington, VA, USA
| | - Augustin Luna
- Computational Biology Branch, National Library of Medicine, Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, USA
| | - Tianfan Fu
- Department of Computer Science, Nanjing University, Nanjing, Jiangsu, China
| |
Collapse
|
2
|
Du D, Bhardwaj S, Lu Y, Wang Y, Parker SJ, Zhang Z, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Embracing the informative missingness and silent gene in analyzing biologically diverse samples. Sci Rep 2024; 14:28265. [PMID: 39550430 PMCID: PMC11569126 DOI: 10.1038/s41598-024-78076-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 10/28/2024] [Indexed: 11/18/2024] Open
Abstract
Bioinformatics software tools are essential to identify informative molecular features that define different phenotypic sample groups. Among the most fundamental and interrelated tasks are missing value imputation, signature gene detection, and differential pattern visualization. However, many commonly used analytics tools can be problematic when handling biologically diverse samples if either informative missingness possess high missing rates with mixed missing mechanisms, or multiple sample groups are compared and visualized in parallel. We developed the ABDS tool suite specifically for analyzing biologically diverse samples. Collectively, a mechanism-integrated group-wise pre-imputation scheme is proposed to retain informative missingness associated with signature genes, a cosine-based one-sample test is extended to detect group-silenced signature genes, and a unified heatmap is designed to display multiple sample groups. We describe the methodological principles and demonstrate the effectiveness of three analytics tools under targeted scenarios, supported by comparative evaluations and biomedical showcases. As an open-source R package, ABDS tool suite complements rather than replaces existing tools and will allow biologists to more accurately detect interpretable molecular signals among phenotypically diverse sample groups.
Collapse
Affiliation(s)
- Dongping Du
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| | - Saurabh Bhardwaj
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, 147004, Punjab, India
| | - Yingzhou Lu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| | - Yizhi Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD, 21231, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing, 100084, P. R. China
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN, 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC, 27157, USA
| | - Yue Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA.
- Dept. of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 N. Glebe Road, Arlington, VA, 22203, USA.
| |
Collapse
|
3
|
Du D, Bhardwaj S, Lu Y, Wang Y, Parker SJ, Zhang Z, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. ABDS: a bioinformatics tool suite for analyzing biologically diverse samples. RESEARCH SQUARE 2024:rs.3.rs-4419408. [PMID: 38853832 PMCID: PMC11160903 DOI: 10.21203/rs.3.rs-4419408/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Bioinformatics software tools are essential to identify informative molecular features that define different phenotypic sample groups. Among the most fundamental and interrelated tasks are missing value imputation, signature gene detection, and differential pattern visualization. However, many commonly used analytics tools can be problematic when handling biologically diverse samples if either informative missingness possess high missing rates with mixed missing mechanisms, or multiple sample groups are compared and visualized in parallel. We developed the ABDS tool suite specifically for analyzing biologically diverse samples. Collectively, a mechanism-integrated group-wise pre-imputation scheme is proposed to retain informative missingness associated with signature genes, a cosine-based one-sample test is extended to detect group-silenced signature genes, and a unified heatmap is designed to display multiple sample groups. We describe the methodological principles and demonstrate the effectiveness of three analytics tools under targeted scenarios, supported by comparative evaluations and biomedical showcases. As an open-source R package, ABDS tool suite complements rather than replaces existing tools and will allow biologists to more accurately detect interpretable molecular signals among phenotypically diverse sample groups.
Collapse
Affiliation(s)
- Dongping Du
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Saurabh Bhardwaj
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Yingzhou Lu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Yizhi Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J. Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing 100084, P. R. China
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M. Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
4
|
Inoue Y, Lee H, Fu T, Luna A. drGAT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous Network. ARXIV 2024:arXiv:2405.08979v1. [PMID: 38800657 PMCID: PMC11118660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Drug development is a lengthy process with a high failure rate. Increasingly, machine learning is utilized to facilitate the drug development processes. These models aim to enhance our understanding of drug characteristics, including their activity in biological contexts. However, a major challenge in drug response (DR) prediction is model interpretability as it aids in the validation of findings. This is important in biomedicine, where models need to be understandable in comparison with established knowledge of drug interactions with proteins. drGAT, a graph deep learning model, leverages a heterogeneous graph composed of relationships between proteins, cell lines, and drugs. drGAT is designed with two objectives: DR prediction as a binary sensitivity prediction and elucidation of drug mechanism from attention coefficients. drGAT has demonstrated superior performance over existing models, achieving 78% accuracy (and precision), and 76% F1 score for 269 DNA-damaging compounds of the NCI60 drug response dataset. To assess the model's interpretability, we conducted a review of drug-gene co-occurrences in Pubmed abstracts in comparison to the top 5 genes with the highest attention coefficients for each drug. We also examined whether known relationships were retained in the model by inspecting the neighborhoods of topoisomerase-related drugs. For example, our model retained TOP1 as a highly weighted predictive feature for irinotecan and topotecan, in addition to other genes that could potentially be regulators of the drugs. Our method can be used to accurately predict sensitivity to drugs and may be useful in the identification of biomarkers relating to the treatment of cancer patients.
Collapse
Affiliation(s)
- Yoshitaka Inoue
- Department of Computer Science and Engineering, University of Minnesota
- Computational Biology Branch, National Library of Medicine
| | - Hunmin Lee
- Department of Computer Science and Engineering, University of Minnesota
| | - Tianfan Fu
- Computer Science Department, Rensselaer Polytechnic Institute
| | - Augustin Luna
- Computational Biology Branch, National Library of Medicine
- Developmental Therapeutics Branch, National Cancer Institute
| |
Collapse
|
5
|
Lu Y, Chen T, Hao N, Van Rechem C, Chen J, Fu T. Uncertainty Quantification and Interpretability for Clinical Trial Approval Prediction. HEALTH DATA SCIENCE 2024; 4:0126. [PMID: 38645573 PMCID: PMC11031120 DOI: 10.34133/hds.0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 03/17/2024] [Indexed: 04/23/2024]
Abstract
Background: Clinical trial is a crucial step in the development of a new therapy (e.g., medication) and is remarkably expensive and time-consuming. Forecasting the approval of clinical trials accurately would enable us to circumvent trials destined to fail, thereby allowing us to allocate more resources to therapies with better chances. However, existing approval prediction algorithms did not quantify the uncertainty and provide interpretability, limiting their usage in real-world clinical trial management. Methods: This paper quantifies uncertainty and improves interpretability in clinical trial approval predictions. We devised a selective classification approach and integrated it with the Hierarchical Interaction Network, the state-of-the-art clinical trial prediction model. Selective classification, encompassing a spectrum of methods for uncertainty quantification, empowers the model to withhold decision-making in the face of samples marked by ambiguity or low confidence. This approach not only amplifies the accuracy of predictions for the instances it chooses to classify but also notably enhances the model's interpretability. Results: Comprehensive experiments demonstrate that incorporating uncertainty markedly enhances the model's performance. Specifically, the proposed method achieved 32.37%, 21.43%, and 13.27% relative improvement on area under the precision-recall curve over the base model (Hierarchical Interaction Network) in phase I, II, and III trial approval predictions, respectively. For phase III trials, our method reaches 0.9022 area under the precision-recall curve scores. In addition, we show a case study of interpretability that helps domain experts to understand model's outcome. The code is publicly available at https://github.com/Vincent-1125/Uncertainty-Quantification-on-Clinical-Trial-Outcome-Prediction. Conclusion: Our approach not only measures model uncertainty but also greatly improves interpretability and performance for clinical trial approval prediction.
Collapse
Affiliation(s)
- Yingzhou Lu
- School of Medicine,
Stanford University, Stanford, CA, USA
| | - Tianyi Chen
- Computer Science Department,
Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Nan Hao
- Stony Brook University Hospital, Stony Brook, NY, USA
| | | | - Jintai Chen
- Computer Science Department,
University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Tianfan Fu
- School of Medicine,
Stanford University, Stanford, CA, USA
| |
Collapse
|
6
|
Du D, Bhardwaj S, Parker SJ, Cheng Z, Zhang Z, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. ABDS: tool suite for analyzing biologically diverse samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.05.547797. [PMID: 37461566 PMCID: PMC10349978 DOI: 10.1101/2023.07.05.547797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/23/2023]
Abstract
Motivation Analytics tools are essential to identify informative molecular features about different phenotypic groups. Among the most fundamental tasks are missing value imputation, signature gene detection, and expression pattern visualization. However, most commonly used analytics tools may be problematic for characterizing biologically diverse samples when either signature genes possess uneven missing rates across different groups yet involving complex missing mechanisms, or multiple biological groups are simultaneously compared and visualized. Results We develop ABDS tool suite tailored specifically to analyzing biologically diverse samples. Mechanism-integrated group-wise imputation is developed to recruit signature genes involving informative missingness, cosine-based one-sample test is extended to detect enumerated signature genes, and unified heatmap is designed to comparably display complex expression patterns. We discuss the methodological principles and demonstrate the conceptual advantages of the three software tools. We also showcase the biomedical applications of these individual tools. Implemented in open-source R scripts, ABDS tool suite complements rather than replaces the existing tools and will allow biologists to more accurately detect interpretable molecular signals among diverse phenotypic samples. Availability and implementation The R Scripts of ABDS tool suite is freely available at https://github.com/niccolodpdu/ABDS.
Collapse
Affiliation(s)
- Dongping Du
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Saurabh Bhardwaj
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Sarah J. Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Zuolin Cheng
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Yingzhou Lu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M. Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
7
|
Wu CT, Shen M, Du D, Cheng Z, Parker SJ, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Cosbin: cosine score-based iterative normalization of biologically diverse samples. BIOINFORMATICS ADVANCES 2022; 2:vbac076. [PMID: 36330358 PMCID: PMC9614059 DOI: 10.1093/bioadv/vbac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/02/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022]
Abstract
Motivation Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant. Results We report an efficient and accurate data-driven method—Cosine score-based iterative normalization (Cosbin)—to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups. Availability and implementation The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Dongping Du
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Zuolin Cheng
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Yingzhou Lu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- To whom correspondence should be addressed.
| |
Collapse
|