1
|
Yi JC, Yang ZY, Zhao WT, Yang ZJ, Zhang XC, Wu CK, Lu AP, Cao DS. ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization. Brief Bioinform 2024; 25:bbae008. [PMID: 38385872 PMCID: PMC10883642 DOI: 10.1093/bib/bbae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/17/2023] [Accepted: 01/02/2024] [Indexed: 02/23/2024] Open
Abstract
Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT.
Collapse
Affiliation(s)
- Jia-Cai Yi
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Tao Zhao
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Xiao-Chen Zhang
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
| | - Cheng-Kun Wu
- State Key Laboratory of High-Performance Computing, Changsha 410073, Hunan, PR China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
2
|
Zhang XC, Yi JC, Yang GP, Wu CK, Hou TJ, Cao DS. ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Brief Bioinform 2022; 23:6535678. [PMID: 35212357 DOI: 10.1093/bib/bbac033] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/10/2022] [Accepted: 01/24/2022] [Indexed: 11/14/2022] Open
Abstract
Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- School of Computer Science, National University of Defense Technology, China
| | - Jia-Cai Yi
- School of Computer Science and Technology, National University of Defense Technology, China
| | - Guo-Ping Yang
- Center of Clinical Pharmacology, the Third Xiangya Hospital, Central South University, China
| | - Cheng-Kun Wu
- Institute for Quantum Information & State Key Laboratory of High-Performance Computing, College of Computer Science and Technology, National University of Defense Technology, China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
3
|
Zhang XC, Wu CK, Yi JC, Zeng XX, Yang CQ, Lu AP, Hou TJ, Cao DS. Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. Research 2022. [DOI: 10.34133/research.0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Accurate prediction of pharmacological properties of small molecules is becoming increasingly important in drug discovery. Traditional feature-engineering approaches heavily rely on handcrafted descriptors and/or fingerprints, which need extensive human expert knowledge. With the rapid progress of artificial intelligence technology, data-driven deep learning methods have shown unparalleled advantages over feature-engineering-based methods. However, existing deep learning methods usually suffer from the scarcity of labeled data and the inability to share information between different tasks when applied to predicting molecular properties, thus resulting in poor generalization capability. Here, we proposed a novel multitask learning BERT (Bidirectional Encoder Representations from Transformer) framework, named MTL-BERT, which leverages large-scale pre-training, multitask learning, and SMILES (simplified molecular input line entry specification) enumeration to alleviate the data scarcity problem. MTL-BERT first exploits a large amount of unlabeled data through self-supervised pretraining to mine the rich contextual information in SMILES strings and then fine-tunes the pretrained model for multiple downstream tasks simultaneously by leveraging their shared information. Meanwhile, SMILES enumeration is used as a data enhancement strategy during the pretraining, fine-tuning, and test phases to substantially increase data diversity and help to learn the key relevant patterns from complex SMILES strings. The experimental results showed that the pretrained MTL-BERT model with few additional fine-tuning can achieve much better performance than the state-of-the-art methods on most of the 60 practical molecular datasets. Additionally, the MTL-BERT model leverages attention mechanisms to focus on SMILES character features essential to target properties for model interpretability.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Shangqiu Normal University, School of Information Technology, Shangqiu 476000, Henan, P. R. China
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Cheng-Kun Wu
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Jia-Cai Yi
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Xiang-Xiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, P. R. China
| | - Can-Qun Yang
- College of Computer, National University of Defense Technology, Changsha 410005, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
4
|
Zhang XC, Wu CK, Yang ZJ, Wu ZX, Yi JC, Hsieh CY, Hou TJ, Cao DS. MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Brief Bioinform 2021; 22:6265201. [PMID: 33951729 DOI: 10.1093/bib/bbab152] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/11/2021] [Accepted: 04/01/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability. RESULTS In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China
| | - Cheng-Kun Wu
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| | - Zhen-Xing Wu
- College of Pharmaceutical Sciences, Zhengjiang University, China
| | - Jia-Cai Yi
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory since 2018. He received his PhD degree in Physics from the University of Ottawa in 2012 and worked as a postdoctoral researcher at the University of Toronto (2012-2013) and Massachusetts Institute of Technology (2013-2016), respectively. Before joining Tencent, he worked as a senior researcher at Singapore-MIT Alliance for Science and Technology (2017-2018)
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
5
|
Byun YT, Park KH, Kim SH, Choi SS, Yi JC, Lim TK. Efficient Single-Mode GaAs/AlGaAs W Waveguide Phase Modulator with a Low Propagation Loss. Appl Opt 1998; 37:496-501. [PMID: 18268612 DOI: 10.1364/ao.37.000496] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
We report a single-mode P-P-p-i-n-N-N GaAs/AlGaAs W waveguide phase modulator with a high phase modulation efficiency and a low propagation loss. The phase modulator with a W-shaped refractive-index profile utilizes a novel epilayer structure to reduce the propagation loss associated with doped layers and to obtain a phase modulation efficiency larger than those of P-i-N double heterostructure modulators. The phase shift and propagation loss were measured with a Fabry-Perot resonance method at 1.31-mum wavelength. A phase modulation efficiency as high as 34.6 degrees /V mm was measured for TE polarized light. Also propagation losses of less than 0.6 dB/cm were achieved. As a result, the W waveguide phase modulator that exhibits a high phase modulation efficiency and a low propagation loss have been experimentally realized for the first time as far as we know.
Collapse
|