1
|
Yan H, Xie Y, Liu Y, Yuan L, Sheng R. ComABAN: refining molecular representation with the graph attention mechanism to accelerate drug discovery. Brief Bioinform 2022; 23:6674166. [PMID: 35998925 DOI: 10.1093/bib/bbac350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 07/16/2022] [Accepted: 07/27/2022] [Indexed: 11/14/2022] Open
Abstract
An unsolved challenge in developing molecular representation is determining an optimal method to characterize the molecular structure. Comprehension of intramolecular interactions is paramount toward achieving this goal. In this study, ComABAN, a new graph-attention-based approach, is proposed to improve the accuracy of molecular representation by simultaneously considering atom-atom, bond-bond and atom-bond interactions. In addition, we benchmark models extensively on 8 public and 680 proprietary industrial datasets spanning a wide variety of chemical end points. The results show that ComABAN has higher prediction accuracy compared with the classical machine learning method and the deep learning-based methods. Furthermore, the trained neural network was used to predict a library of 1.5 million molecules and picked out compounds with a classification result of grade I. Subsequently, these predicted molecules were scored and ranked using cascade docking, molecular dynamics simulations to generate five potential candidates. All five molecules showed high similarity to nanomolar bioactive inhibitors suppressing the expression of HIF-1α, and we synthesized three compounds (Y-1, Y-3, Y-4) and tested their inhibitory ability in vitro. Our results indicate that ComABAN is an effective tool for accelerating drug discovery.
Collapse
Affiliation(s)
- Huihui Yan
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| | - Yuanyuan Xie
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, P. R. China
| | - Yao Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| | - Leer Yuan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| | - Rong Sheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| |
Collapse
|
2
|
Kawai K, Asanuma Y, Kato T, Karuo Y, Tarui A, Sato K, Omote M. LCP: Simple Representation of Docking Poses for Machine Learning: A Case Study on Xanthine Oxidase Inhibitors. Mol Inform 2021; 41:e2100245. [PMID: 34843171 DOI: 10.1002/minf.202100245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 11/21/2021] [Indexed: 11/05/2022]
Abstract
In this paper, we propose a simple descriptor called the ligand coordinate profile (LCP) for describing docking poses. The LCP descriptor is generated from the coordinates of the polar hydrogen and heavy atoms of the docked ligand. We hypothesize that the prediction of binding poses can be enhanced through the combination of machine learning methods with the LCP descriptor. Two docking programs were used to predict ligand docking against xanthine oxidase. Four machine learning methods-k-nearest neighbors, random forest, support vector machine, and LightGBM-were used to determine whether machine learning-based models could be used to accurately identify the correct binding poses. Regardless of the machine learning method employed, the LCP descriptor demonstrated improved performance compared to the existing descriptor. The results of the leave-one-pdb-out approach revealed that the influence of the pose descriptor was also significant, as demonstrated through cross-validation. When evaluated using top-N metrics, the machine learning models were generally more effective than the docking programs. In addition, the LCP-based models outperformed those based on the existing descriptor. The results obtained in this study suggest that our proposed binding pose descriptor is effective for improving the docking accuracy of xanthine oxidase inhibitors.
Collapse
Affiliation(s)
- Kentaro Kawai
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yoshitaka Asanuma
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Toshiki Kato
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yukiko Karuo
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Atsushi Tarui
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Kazuyuki Sato
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Masaaki Omote
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| |
Collapse
|
3
|
Sadeghi F, Afkhami A, Madrakian T, Ghavami R. Computational study on subfamilies of piperidine derivatives: QSAR modelling, model external verification, the inter-subset similarity determination, and structure-based drug designing. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:433-462. [PMID: 33960256 DOI: 10.1080/1062936x.2021.1891568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/14/2021] [Indexed: 06/12/2023]
Abstract
A new subset of furan-pyrazole piperidine derivatives was used for QSAR model development. These compounds exhibit good Akt1 inhibitory activity; moreover, antiproliferative activities in vitro against OVCAR-8 (Human ovarian carcinoma cells) and HCT116 (human colon cancer cells), were confirmed for them. Based on the relevant three-dimensional (3D) and 2D autocorrelation descriptors, selected by genetic algorithm (GA), multiple linear regression (MLR) was established on half maximal-inhibitory concentration (IC50), in Akt1 and cancer cell lines independently. Robustness, stability, and predictive ability of the models were evaluated using external and internal validation (r2: 0.742-0.832, Q2LOO: 0.684-0.796, RMSE: 0.247-0.299, F: 32.283-57.578, and r2y-random: 0.049-0.080). Furthermore, in the new strategy, each of the evaluated models was generalized to two other subfamilies of piperidines to simultaneously compare the activities and structural similarity of these three subsets. Probably, structural similarity can be more considered as a criterion of similarity in the mechanism of action. Also, external verification of suggested predictive models was performed by another subset. Finally, by focusing on M64 as the most potent in vivo antitumor compound, 15 new derivatives were designed and six potent candidates were proposed for further investigation.
Collapse
Affiliation(s)
- F Sadeghi
- Faculty of Chemistry, Bu-Ali Sina University, Hamedan, Iran
| | - A Afkhami
- Faculty of Chemistry, Bu-Ali Sina University, Hamedan, Iran
- Department of Chemistry, D-8 International University, Hamedan, Iran
| | - T Madrakian
- Faculty of Chemistry, Bu-Ali Sina University, Hamedan, Iran
| | - R Ghavami
- Chemometrics Laboratory, Chemistry Department, Faculty of Science, University of Kurdistan, Sanandaj, Iran
| |
Collapse
|
4
|
Kawai K, Tomonou M, Machida Y, Karuo Y, Tarui A, Sato K, Ikeda Y, Kinashi T, Omote M. Effect of Learning Dataset for Identification of Active Molecules: A Case Study of Integrin αIIbβ3 Inhibitors. Mol Inform 2021; 40:e2060040. [PMID: 33738924 DOI: 10.1002/minf.202060040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/30/2021] [Indexed: 01/13/2023]
Abstract
Efficient in silico approaches are needed to identify strong integrin αIIbβ3 inhibitors through a small number of measurements. To address the challenge, we investigated the effect of learning dataset on the classification performance of machine learning models focusing on weak and inactive compounds. The structure and activity information of the compounds were obtained from ChEMBL, and pCHEMBL values were used to classify them as active, inactive, or weak. Datasets with various imbalance levels from active:inactive=1 : 1 to 1 : 1000 were used for the machine learning. The prediction scores of the weak samples were found to lie between the predictive values of active and inactive compounds. In addition, another dataset that consists of 149 actives and 6.9 million inactives was screened; the results indicated that the number of positive predictions decreased for models trained with a higher number of inactives. Although there is a trade-off between false positives and false negatives, for determination of compounds with strong activity using a reduced number of measurements, it is better to use a large number of inactives for learning and identifying compounds that score higher than the weak samples.
Collapse
Affiliation(s)
- Kentaro Kawai
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Mami Tomonou
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yume Machida
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yukiko Karuo
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Atsushi Tarui
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Kazuyuki Sato
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yoshiki Ikeda
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Tatsuo Kinashi
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Masaaki Omote
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| |
Collapse
|
5
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|