1
|
Chen L, Lu Y, Xu J, Zhou B. Prediction of drug's anatomical therapeutic chemical (ATC) code by constructing biological profiles of ATC codes. BMC Bioinformatics 2025; 26:86. [PMID: 40119265 PMCID: PMC11927162 DOI: 10.1186/s12859-025-06102-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 03/04/2025] [Indexed: 03/24/2025] Open
Abstract
BACKGROUND The Anatomical Therapeutic Chemical (ATC) classification system, proposed and maintained by the World Health Organization, is among the most widely used drug classification schemes. Recently, it has become a key research focus in drug repositioning. Computational models often pair drugs with ATC codes to explore drug-ATC code associations. However, the limited information available for ATC codes constrains these models, leaving significant room for improvement. RESULTS This study presents an inference method to identify highly related target proteins, structural features, and side effects for each ATC code, constructing comprehensive biological profiles. Association networks for target proteins, structural features, and side effects are established, and a random walk with restart algorithm is applied to these networks to extract raw associations. A permutation test is then conducted to exclude false positives, yielding robust biological profiles for ATC codes. These profiles are used to construct new ATC code kernels, which are integrated with ATC code kernels from the existing model PDATC-NCPMKL. The recommendation matrix is subsequently generated using the procedures of PDATC-NCPMKL. Cross-validation results demonstrate that the new model achieves AUROC and AUPR values exceeding 0.96. CONCLUSION The proposed model outperforms PDATC-NCPMKL and other previous models. Analysis of the contributions of the newly added ATC code kernels confirms the value of biological profiles in enhancing the prediction of drug-ATC code associations.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.
| | - Yiwen Lu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - Jing Xu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - Bo Zhou
- School of Basic Medical Sciences, Shanghai University of Medicine and Health Sciences, Shanghai, 201318, People's Republic of China
| |
Collapse
|
2
|
Zhang W, Tian Q, Cao Y, Fan W, Jiang D, Wang Y, Li Q, Wei XY. GraphATC: advancing multilevel and multi-label anatomical therapeutic chemical classification via atom-level graph learning. Brief Bioinform 2025; 26:bbaf194. [PMID: 40285359 PMCID: PMC12031726 DOI: 10.1093/bib/bbaf194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 03/06/2025] [Accepted: 04/07/2025] [Indexed: 04/29/2025] Open
Abstract
The accurate categorization of compounds within the anatomical therapeutic chemical (ATC) system is fundamental for drug development and fundamental research. Although this area has garnered significant research focus for over a decade, the majority of prior studies have concentrated solely on the Level 1 labels defined by the World Health Organization (WHO), neglecting the labels of the remaining four levels. This narrow focus fails to address the true nature of the task as a multilevel, multi-label classification challenge. Moreover, existing benchmarks like Chen-2012 and ATC-SMILES have become outdated, lacking the incorporation of new drugs or updated properties of existing ones that have emerged in recent years and have been integrated into the WHO ATC system. To tackle these shortcomings, we present a comprehensive approach in this paper. Firstly, we systematically cleanse and enhance the drug dataset, expanding it to encompass all five levels through a rigorous cross-resource validation process involving KEGG, PubChem, ChEMBL, ChemSpider, and ChemicalBook. This effort culminates in the creation of a novel benchmark termed ATC-GRAPH. Secondly, we extend the classification task to encompass Level 2 and introduce graph-based learning techniques to provide more accurate representations of drug molecular structures. This approach not only facilitates the modeling of Polymers, Macromolecules, and Multi-Component drugs more precisely but also enhances the overall fidelity of the classification process. The efficacy of our proposed framework is validated through extensive experiments, establishing a new state-of-the-art methodology. To facilitate the replication of this study, we have made the benchmark dataset, source code, and web server openly accessible.
Collapse
Affiliation(s)
- Wengyu Zhang
- Department of Computer Science, Sichuan University, Chengdu 610065, China
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Qi Tian
- Department of Computer Science, Sichuan University, Chengdu 610065, China
| | - Yi Cao
- Department of Computer Science, Sichuan University, Chengdu 610065, China
| | - Wenqi Fan
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| | | | - Yaowei Wang
- Peng Cheng Laboratory, Shenzhen 518000, China
| | - Qing Li
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Xiao-Yong Wei
- Department of Computer Science, Sichuan University, Chengdu 610065, China
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| |
Collapse
|
3
|
Li X, Peng L, Wang YP, Zhang W. Open challenges and opportunities in federated foundation models towards biomedical healthcare. BioData Min 2025; 18:2. [PMID: 39755653 DOI: 10.1186/s13040-024-00414-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 12/09/2024] [Indexed: 01/06/2025] Open
Abstract
This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) in biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions. The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for healthcare innovations.
Collapse
Affiliation(s)
- Xingyu Li
- Department of Computer Science, Tulane University, New Orleans, LA, USA
| | - Lu Peng
- Department of Computer Science, Tulane University, New Orleans, LA, USA.
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | - Weihua Zhang
- School of Computer Science, Fudan University, Shanghai, China
| |
Collapse
|
4
|
Chen L, Xu J, Zhou Y. PDATC-NCPMKL: Predicting drug's Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning. Comput Biol Med 2024; 169:107862. [PMID: 38150886 DOI: 10.1016/j.compbiomed.2023.107862] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/19/2023] [Accepted: 12/17/2023] [Indexed: 12/29/2023]
Abstract
The development and discovery of new drugs is time-consuming and needs lots of human and material resources. Therefore, discovery of novel effects of existing drugs is an important alternative way, which can accelerate the process of designing "new" drugs. The anatomical Therapeutic Chemical (ATC) classification system recommended by World Health Organization (WHO) is a basic research area in this regard. A novel ATC code of an existing drug suggests its novel effects. Some computational models have been proposed, which can predict the drug-ATC code associations. However, their performance is not very high. There still exist spaces for improvement. In this study, a new recommendation system (named PDATC-NCPMKL), which incorporated network consistency projection and multi-kernel learning, was designed to identify drug-ATC code associations. For drugs or ATC codes, several kernels were constructed, which were fused by a multiple kernel learning method and an additional kernel integration scheme. To enhance the performance, the drug-ATC code association adjacency matrix was reformulated by a variant of weighted K nearest known neighbors (WKNKN). The reformulated adjacency matrix, drug and ATC code kernels were fed into network consistency projection to generate the association score matrix. The proposed recommendation system was tested on the ATC codes at the second, third and fourth levels in drug ATC classification system using ten-fold cross-validation. The results indicated that all AUROC and AUPR values were close to or exceeded 0.96. Such performance was higher than some existing computational models. Some additional tests were conducted to prove the utility of adjacency matrix reformulation and to analyze the importance of drug and ATC code kernels.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China.
| | - Jing Xu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China.
| | - Yubin Zhou
- Department of Thoracic Surgery, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, China.
| |
Collapse
|
5
|
Zhou B, Ran B, Chen L. A GraphSAGE-based model with fingerprints only to predict drug-drug interactions. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:2922-2942. [PMID: 38454713 DOI: 10.3934/mbe.2024130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Drugs are an effective way to treat various diseases. Some diseases are so complicated that the effect of a single drug for such diseases is limited, which has led to the emergence of combination drug therapy. The use multiple drugs to treat these diseases can improve the drug efficacy, but it can also bring adverse effects. Thus, it is essential to determine drug-drug interactions (DDIs). Recently, deep learning algorithms have become popular to design DDI prediction models. However, most deep learning-based models need several types of drug properties, inducing the application problems for drugs without these properties. In this study, a new deep learning-based model was designed to predict DDIs. For wide applications, drugs were first represented by commonly used properties, referred to as fingerprint features. Then, these features were perfectly fused with the drug interaction network by a type of graph convolutional network method, GraphSAGE, yielding high-level drug features. The inner product was adopted to score the strength of drug pairs. The model was evaluated by 10-fold cross-validation, resulting in an AUROC of 0.9704 and AUPR of 0.9727. Such performance was better than the previous model which directly used drug fingerprint features and was competitive compared with some other previous models that used more drug properties. Furthermore, the ablation tests indicated the importance of the main parts of the model, and we analyzed the strengths and limitations of a model for drugs with different degrees in the network. This model identified some novel DDIs that may bring expected benefits, such as the combination of PEA and cannabinol that may produce better effects. DDIs that may cause unexpected side effects have also been discovered, such as the combined use of WIN 55,212-2 and cannabinol. These DDIs can provide novel insights for treating complex diseases or avoiding adverse drug events.
Collapse
Affiliation(s)
- Bo Zhou
- Institute of Wound Prevention and Treatment, Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
- School of Basic Medical Sciences, Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
| | - Bing Ran
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
6
|
Wu Y, Li K, Li M, Pu X, Guo Y. Attention Mechanism-Based Graph Neural Network Model for Effective Activity Prediction of SARS-CoV-2 Main Protease Inhibitors: Application to Drug Repurposing as Potential COVID-19 Therapy. J Chem Inf Model 2023; 63:7011-7031. [PMID: 37960886 DOI: 10.1021/acs.jcim.3c01280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Compared to de novo drug discovery, drug repurposing provides a time-efficient way to treat coronavirus disease 19 (COVID-19) that is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 main protease (Mpro) has been proved to be an attractive drug target due to its pivotal involvement in viral replication and transcription. Here, we present a graph neural network-based deep-learning (DL) strategy to prioritize the existing drugs for their potential therapeutic effects against SARS-CoV-2 Mpro. Mpro inhibitors were represented as molecular graphs ready for graph attention network (GAT) and graph isomorphism network (GIN) modeling for predicting the inhibitory activities. The result shows that the GAT model outperforms the GIN and other competitive models and yields satisfactory predictions for unseen Mpro inhibitors, confirming its robustness and generalization. The attention mechanism of GAT enables to capture the dominant substructures and thus to realize the interpretability of the model. Finally, we applied the optimal GAT model in conjunction with molecular docking simulations to screen the Drug Repurposing Hub (DRH) database. As a result, 18 drug hits with best consensus prediction scores and binding affinity values were identified as the potential therapeutics against COVID-19. Both the extensive literature searching and evaluations on adsorption, distribution, metabolism, excretion, and toxicity (ADMET) illustrate the premium drug-likeness and pharmacokinetic properties of the drug candidates. Overall, our work not only provides an effective GAT-based DL prediction tool for inhibitory activity of SARS-CoV-2 Mpro inhibitors but also provides theoretical guidelines for drug discovery in the COVID-19 treatment.
Collapse
Affiliation(s)
- Yanling Wu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Kun Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
7
|
Zhang P, Zhang D, Zhou W, Wang L, Wang B, Zhang T, Li S. Network pharmacology: towards the artificial intelligence-based precision traditional Chinese medicine. Brief Bioinform 2023; 25:bbad518. [PMID: 38197310 PMCID: PMC10777171 DOI: 10.1093/bib/bbad518] [Citation(s) in RCA: 121] [Impact Index Per Article: 60.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/03/2023] [Accepted: 11/30/2023] [Indexed: 01/11/2024] Open
Abstract
Network pharmacology (NP) provides a new methodological perspective for understanding traditional medicine from a holistic perspective, giving rise to frontiers such as traditional Chinese medicine network pharmacology (TCM-NP). With the development of artificial intelligence (AI) technology, it is key for NP to develop network-based AI methods to reveal the treatment mechanism of complex diseases from massive omics data. In this review, focusing on the TCM-NP, we summarize involved AI methods into three categories: network relationship mining, network target positioning and network target navigating, and present the typical application of TCM-NP in uncovering biological basis and clinical value of Cold/Hot syndromes. Collectively, our review provides researchers with an innovative overview of the methodological progress of NP and its application in TCM from the AI perspective.
Collapse
Affiliation(s)
- Peng Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Dingfan Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wuai Zhou
- China Mobile Information System Integration Co., Ltd, Beijing 100032, China
| | - Lan Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Boyang Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tingyu Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shao Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
8
|
Cao Y, Yang ZQ, Zhang XL, Fan W, Wang Y, Shen J, Wei DQ, Li Q, Wei XY. Identifying the kind behind SMILES-anatomical therapeutic chemical classification using structure-only representations. Brief Bioinform 2022; 23:6677124. [PMID: 36027578 DOI: 10.1093/bib/bbac346] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/11/2022] [Accepted: 07/26/2022] [Indexed: 01/25/2023] Open
Abstract
Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. We present a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development. To this end, we construct a new benchmark consisting of 4545 compounds which is with larger scale than the one used in previous study. A light-weight prediction model is proposed. The model is with better explainability in the sense that it is consists of a straightforward tokenization that extracts and embeds statistically and physicochemically meaningful tokens, and a deep network backed by a set of pyramid kernels to capture multi-resolution chemical structural characteristics. Its efficacy has been validated in the experiments where it outperforms the state-of-the-art methods by 15.53% in accuracy and by 69.66% in terms of efficiency. We make the benchmark dataset, source code and web server open to ease the reproduction of this study.
Collapse
Affiliation(s)
- Yi Cao
- Department of Computer Science, Sichuan University, 610065, Chengdu, China
| | - Zhen-Qun Yang
- Department of Biomedical Engineering, Chinese University of Hong Kong, Street, Shatin, Hong Kong
| | - Xu-Lu Zhang
- Department of Computer Science, Sichuan University, 610065, Chengdu, China
| | - Wenqi Fan
- Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Yaowei Wang
- Peng Cheng Laboratory, 518000, Shenzhen, China
| | | | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qing Li
- Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Xiao-Yong Wei
- Department of Computer Science, Sichuan University, 610065, Chengdu, China.,Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| |
Collapse
|
9
|
Assessing sequence-based protein-protein interaction predictors for use in therapeutic peptide engineering. Sci Rep 2022; 12:9610. [PMID: 35688894 PMCID: PMC9187631 DOI: 10.1038/s41598-022-13227-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/25/2022] [Indexed: 12/01/2022] Open
Abstract
Engineering peptides to achieve a desired therapeutic effect through the inhibition of a specific target activity or protein interaction is a non-trivial task. Few of the existing in silico peptide design algorithms generate target-specific peptides. Instead, many methods produce peptides that achieve a desired effect through an unknown mechanism. In contrast with resource-intensive high-throughput experiments, in silico screening is a cost-effective alternative that can prune the space of candidates when engineering target-specific peptides. Using a set of FDA-approved peptides we curated specifically for this task, we assess the applicability of several sequence-based protein–protein interaction predictors as a screening tool within the context of peptide therapeutic engineering. We show that similarity-based protein–protein interaction predictors are more suitable for this purpose than the state-of-the-art deep learning methods publicly available at the time of writing. We also show that this approach is mostly useful when designing new peptides against targets for which naturally-occurring interactors are already known, and that deploying it for de novo peptide engineering tasks may require gathering additional target-specific training data. Taken together, this work offers evidence that supports the use of similarity-based protein–protein interaction predictors for peptide therapeutic engineering, especially peptide analogs.
Collapse
|