1
|
Qin R, Zhang H, Huang W, Shao Z, Lei J. Deep learning-based design and screening of benzimidazole-pyrazine derivatives as adenosine A 2B receptor antagonists. J Biomol Struct Dyn 2025; 43:3225-3241. [PMID: 38133953 DOI: 10.1080/07391102.2023.2295974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/11/2023] [Indexed: 12/24/2023]
Abstract
The Adenosine A2B receptor (A2BAR) is considered a novel potential target for the immunotherapy of cancer, and A2BAR antagonists have an inhibitory effect on tumor growth, proliferation, and metastasis. In our previous studies, we identified a class of benzimidazole-pyrazine scaffolds whose derivatives exhibited the antagonistic effect but lacked subtype selectivity towards A2BAR. In this work, we developed a scaffold-based protocol that incorporates a deep generative model and multilayer virtual screening to design benzimidazole-pyrazine derivatives as potential selective A2BAR antagonists. By utilizing a generative model with reported A2BAR antagonists as the training set, we built up a scaffold-focused library of benzimidazole-pyrazine derivatives and processed a virtual screening protocol to discover potential A2BAR antagonists. Finally, five molecules with different Bemis-Murcko scaffolds were identified and exhibited higher binding free energies than the reference molecule 12o. Further computational analysis revealed that the 3-benzyl derivative ABA-1266 presented high selectivity toward A2BAR and showed preferred draggability, providing future potent development of selective A2BAR antagonists.
Collapse
Affiliation(s)
- Rui Qin
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hao Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Weifeng Huang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhenglin Shao
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jinping Lei
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
2
|
Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4849-4863. [PMID: 40038923 DOI: 10.1109/tnnls.2024.3359657] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Predicting the pharmacological activity, toxicity, and pharmacokinetic properties of molecules is a central task in drug discovery. Existing machine learning methods are transferred from one resource rich molecular property to another data scarce property in the same scaffold dataset. However, existing models may produce fragile and highly uncertain predictions for new scaffold molecules. And these models were tested on different benchmarks, which seriously affected the quality of their evaluation results. In this article, we introduce Meta-MolNet, a collection of data benchmark and algorithms, which is a standard benchmark platform for measuring model generalization and uncertainty quantification capabilities. Meta-MolNet manages a wide range of molecular datasets with high ratio of molecules/scaffolds, which often leads to more difficult data shift and generalization problems. Furthermore, we propose a graph attention network based on cross-domain meta-learning, Meta-GAT, which uses bilevel optimization to learn meta-knowledge from the scaffold family molecular dataset in the source domain. Meta-GAT benefits from meta-knowledge that reduces the requirement of sample complexity to enable reliable predictions of new scaffold molecules in the target domain through internal iteration of a few examples. We evaluate existing methods as baselines for the community, and the Meta-MolNet benchmark demonstrates the effectiveness of measuring the proposed algorithm in domain generalization and uncertainty quantification. Extensive experiments demonstrate that the Meta-GAT model has state-of-the-art domain generalization performance and robustly estimates uncertainty under few examples constraints. By publishing AI-ready data, evaluation frameworks, and baseline results, we hope to see the Meta-MolNet suite become a comprehensive resource for the AI-assisted drug discovery community. Meta-MolNet is freely accessible at https://github.com/lol88/Meta-MolNet.
Collapse
|
3
|
Dou H, Virtanen S, Ravikumar N, Frangi AF. A Generative Shape Compositional Framework to Synthesize Populations of Virtual Chimeras. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4750-4764. [PMID: 38502618 DOI: 10.1109/tnnls.2024.3374121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Generating virtual organ populations that capture sufficient variability while remaining plausible is essential to conduct in silico trials (ISTs) of medical devices. However, not all anatomical shapes of interest are always available for each individual in a population. The imaging examinations and modalities used can vary between subjects depending on their individualized clinical pathways. Different imaging modalities may have various fields of view and are sensitive to signals from other tissues/organs, or both. Hence, missing/partially overlapping anatomical information is often available across individuals. We introduce a generative shape model for multipart anatomical structures, learnable from sets of unpaired datasets, i.e., where each substructure in the shape assembly comes from datasets with missing or partially overlapping substructures from disjoint subjects of the same population. The proposed generative model can synthesize complete multipart shape assemblies coined virtual chimeras (VCs). We applied this framework to build VCs from databases of whole-heart shape assemblies that each contribute samples for heart substructures. Specifically, we propose a graph neural network-based generative shape compositional framework, which comprises two components, a part-aware generative shape model that captures the variability in shape observed for each structure of interest in the training population and a spatial composition network that assembles/composes the structures synthesized by the former into multipart shape assemblies (i.e., VCs). We also propose a novel self-supervised learning scheme that enables the spatial composition network to be trained with partially overlapping data and weak labels. We trained and validated our approach using shapes of cardiac structures derived from cardiac magnetic resonance (MR) images in the UK Biobank (UKBB). When trained with complete and partially overlapping data, our approach significantly outperforms a principal component analysis (PCA)-based shape model (trained with complete data) in terms of generalizability and specificity. This demonstrates the superiority of the proposed method, as the synthesized cardiac virtual populations are more plausible and capture a greater degree of shape variability than those generated by the PCA-based shape model.
Collapse
|
4
|
Li M, Cao Y, Liu X, Ji H. Structure-Aware Graph Attention Diffusion Network for Protein-Ligand Binding Affinity Prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18370-18380. [PMID: 37751351 DOI: 10.1109/tnnls.2023.3314928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Accurate prediction of protein-ligand binding affinities can significantly advance the development of drug discovery. Several graph neural network (GNN)-based methods learn representations of protein-ligand complexes via modeling intermolecule interactions and spatial structures (e.g., distances and angles) of complexes. However, these methods fail to emphasize the importance of bonds and learn hierarchical structures of complexes, which are significant for binding affinity prediction. In this article, we propose the structure-aware graph attention diffusion network (SGADN) to incorporate both distance and angle information for efficient spatial structure learning. We model complexes as line graphs with distance and angle information, focusing on bonds as nodes. Then we perform line graph attention diffusion layers (LGADLs) on line graphs to explore long-range bond node interactions and enhance spatial structure learning. Furthermore, we propose an attentive pooling layer (APL) to refine the hierarchical structures in complexes. Extensive experimental studies on two benchmarks demonstrate the superiority of SGADN for binding affinity prediction.
Collapse
|
5
|
Mukherjee J, Sharma R, Dutta P, Bhunia B. Artificial intelligence in healthcare: a mastery. Biotechnol Genet Eng Rev 2024; 40:1659-1708. [PMID: 37013913 DOI: 10.1080/02648725.2023.2196476] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023]
Abstract
There is a vast development of artificial intelligence (AI) in recent years. Computational technology, digitized data collection and enormous advancement in this field have allowed AI applications to penetrate the core human area of specialization. In this review article, we describe current progress achieved in the AI field highlighting constraints on smooth development in the field of medical AI sector, with discussion of its implementation in healthcare from a commercial, regulatory and sociological standpoint. Utilizing sizable multidimensional biological datasets that contain individual heterogeneity in genomes, functionality and milieu, precision medicine strives to create and optimize approaches for diagnosis, treatment methods and assessment. With the arise of complexity and expansion of data in the health-care industry, AI can be applied more frequently. The main application categories include indications for diagnosis and therapy, patient involvement and commitment and administrative tasks. There has recently been a sharp rise in interest in medical AI applications due to developments in AI software and technology, particularly in deep learning algorithms and in artificial neural network (ANN). In this overview, we enlisted the major categories of issues that AI systems are ideally equipped to resolve followed by clinical diagnostic tasks. It also includes a discussion of the future potential of AI, particularly for risk prediction in complex diseases, and the difficulties, constraints and biases that must be meticulously addressed for the effective delivery of AI in the health-care sector.
Collapse
Affiliation(s)
- Jayanti Mukherjee
- Department of Pharmaceutical Chemistry, CMR College of Pharmacy Affiliated to Jawaharlal Nehru Technological University, Hyderabad, Telangana, India
| | - Ramesh Sharma
- Department of Bioengineering, National Institute of Technology, Agartala, India
| | - Prasenjit Dutta
- Department of Production Engineering, National Institute of Technology, Agartala, India
| | - Biswanath Bhunia
- Department of Bioengineering, National Institute of Technology, Agartala, India
| |
Collapse
|
6
|
Yang Y, Sun Y, Wang S, Gao J, Ju F, Yin B. A Dual-Masked Deep Structural Clustering Network With Adaptive Bidirectional Information Delivery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14783-14796. [PMID: 37459264 DOI: 10.1109/tnnls.2023.3281570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
Structured clustering networks, which alleviate the oversmoothing issue by delivering hidden features from autoencoder (AE) to graph convolutional networks (GCNs), involve two shortcomings for the clustering task. For one thing, they used vanilla structure to learn clustering representations without considering feature and structure corruption; for another thing, they exhibit network degradation and vanishing gradient issues after stacking multilayer GCNs. In this article, we propose a clustering method called dual-masked deep structural clustering network (DMDSC) with adaptive bidirectional information delivery (ABID). Specifically, DMDSC enables generative self-supervised learning to mine deeper interstructure and interfeature correlations by simultaneously reconstructing corrupted structures and features. Furthermore, DMDSC develops an ABID module to establish an information transfer channel between each pairwise layer of AE and GCNs to alleviate the oversmoothing and vanishing gradient problems. Numerous experiments on six benchmark datasets have shown that the proposed DMDSC outperforms the most advanced deep clustering algorithms.
Collapse
|
7
|
Li Y, Lin Y, Hu P, Peng D, Luo H, Peng X. Single-Cell RNA-Seq Debiased Clustering via Batch Effect Disentanglement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11371-11381. [PMID: 37030864 DOI: 10.1109/tnnls.2023.3260003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A variety of single-cell RNA-seq (scRNA-seq) clustering methods has achieved great success in discovering cellular phenotypes. However, it remains challenging when the data confounds with batch effects brought by different experimental conditions or technologies. Namely, the data partitions would be biased toward these nonbiological factors. Meanwhile, the batch differences are not always much smaller than true biological variations, hindering the cooperation of batch integration and clustering methods. To overcome this challenge, we propose single-cell RNA-seq debiased clustering (SCDC), an end-to-end clustering method that is debiased toward batch effects by disentangling the biological and nonbiological information from scRNA-seq data during data partitioning. In six analyses, SCDC qualitatively and quantitatively outperforms both the state-of-the-art clustering and batch integration methods in handling scRNA-seq data with batch effects. Furthermore, SCDC clusters data with a linearly increasing running time with respect to cell numbers and a fixed graphics processing unit (GPU) memory consumption, making it scalable to large datasets. The code will be released on Github.
Collapse
|
8
|
Liu Y, Xu C, Yang X, Zhang Y, Chen Y, Liu H. Application progress of deep generative models in de novo drug design. Mol Divers 2024; 28:2411-2427. [PMID: 39097862 DOI: 10.1007/s11030-024-10942-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 07/16/2024] [Indexed: 08/05/2024]
Abstract
The deep molecular generative model has recently become a research hotspot in pharmacy. This paper analyzes a large number of recent reports and reviews these models. In the central part of this paper, four compound databases and two molecular representation methods are compared. Five model architectures and applications for deep molecular generative models are emphatically introduced. Three evaluation metrics for model evaluation are listed. Finally, the limitations and challenges in this field are discussed to provide a reference and basis for developing and researching new models published in future.
Collapse
Affiliation(s)
- Yingxu Liu
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Chengcheng Xu
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Xinyi Yang
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Yanmin Zhang
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Yadong Chen
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Haichun Liu
- School of Science, China Pharmaceutical University, Nanjing, 210009, China.
| |
Collapse
|
9
|
Tang Z, Chen G, Yang H, Zhong W, Chen CYC. DSIL-DDI: A Domain-Invariant Substructure Interaction Learning for Generalizable Drug-Drug Interaction Prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10552-10560. [PMID: 37022856 DOI: 10.1109/tnnls.2023.3242656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Drug-drug interactions (DDIs) trigger unexpected pharmacological effects in vivo, often with unknown causal mechanisms. Deep learning methods have been developed to better understand DDI. However, learning domain-invariant representations for DDI remains a challenge. Generalizable DDI predictions are closer to reality than source domain predictions. For existing methods, it is difficult to achieve out-of-distribution (OOD) predictions. In this article, focusing on substructure interaction, we propose DSIL-DDI, a pluggable substructure interaction module that can learn domain-invariant representations of DDIs from source domain. We evaluate DSIL-DDI on three scenarios: the transductive setting (all drugs in test set appear in training set), the inductive setting (test set contains new drugs that were not present in training set), and OOD generalization setting (training set and test set belong to two different datasets). The results demonstrate that DSIL-DDI improve the generalization and interpretability of DDI prediction modeling and provides valuable insights for OOD DDI predictions. DSIL-DDI can help doctors ensuring the safety of drug administration and reducing the harm caused by drug abuse.
Collapse
|
10
|
Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta Learning With Graph Attention Networks for Low-Data Drug Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11218-11230. [PMID: 37028032 DOI: 10.1109/tnnls.2023.3250324] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Finding candidate molecules with favorable pharmacological activity, low toxicity, and proper pharmacokinetic properties is an important task in drug discovery. Deep neural networks have made impressive progress in accelerating and improving drug discovery. However, these techniques rely on a large amount of label data to form accurate predictions of molecular properties. At each stage of the drug discovery pipeline, usually, only a few biological data of candidate molecules and derivatives are available, indicating that the application of deep neural networks for low-data drug discovery is still a formidable challenge. Here, we propose a meta learning architecture with graph attention network, Meta-GAT, to predict molecular properties in low-data drug discovery. The GAT captures the local effects of atomic groups at the atom level through the triple attentional mechanism and implicitly captures the interactions between different atomic groups at the molecular level. GAT is used to perceive molecular chemical environment and connectivity, thereby effectively reducing sample complexity. Meta-GAT further develops a meta learning strategy based on bilevel optimization, which transfers meta knowledge from other attribute prediction tasks to low-data target tasks. In summary, our work demonstrates how meta learning can reduce the amount of data required to make meaningful predictions of molecules in low-data scenarios. Meta learning is likely to become the new learning paradigm in low-data drug discovery. The source code is publicly available at: https://github.com/lol88/Meta-GAT.
Collapse
|
11
|
Yi Y, Wan X, Zhao K, Ou-Yang L, Zhao P. Equivariant Line Graph Neural Network for Protein-Ligand Binding Affinity Prediction. IEEE J Biomed Health Inform 2024; 28:4336-4347. [PMID: 38551822 DOI: 10.1109/jbhi.2024.3383245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Binding affinity prediction of three-dimensional (3D) protein-ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches usually transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local coordinate systems of the 3D complex. As a result, these approaches can not fully learn the global information of the complex, such as the physical symmetry and the topological information of bonds. To address these issues, we propose a novel Equivariant Line Graph Network (ELGN) for binding affinity prediction of 3D protein-ligand complexes. The proposed ELGN firstly adds a super node to the 3D complex, and then builds a line graph based on the 3D complex. After that, ELGN uses a new E(3)-equivariant network layer to pass the messages between nodes and edges based on the global coordinate system of the 3D complex. Experimental results on two real datasets demonstrate the effectiveness of ELGN over several state-of-the-art baselines.
Collapse
|
12
|
Bai Q, Xu T, Huang J, Pérez-Sánchez H. Geometric deep learning methods and applications in 3D structure-based drug design. Drug Discov Today 2024; 29:104024. [PMID: 38759948 DOI: 10.1016/j.drudis.2024.104024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024]
Abstract
3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.
Collapse
Affiliation(s)
- Qifeng Bai
- School of Basic Medical Sciences, Lanzhou University, Lanzhou 730000, Gansu, PR China.
| | | | - Junzhou Huang
- Department of Computer Science and Engineering, the University of Texas at Arlington, Arlington, TX 76019, USA
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, UCAM Universidad Católica de Murcia, Murcia 30107, Spain.
| |
Collapse
|
13
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
14
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
15
|
Hierarchical Molecular Graph Self-Supervised Learning for property prediction. Commun Chem 2023; 6:34. [PMID: 36801953 PMCID: PMC9938270 DOI: 10.1038/s42004-023-00825-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/31/2023] [Indexed: 02/19/2023] Open
Abstract
Molecular graph representation learning has shown considerable strength in molecular analysis and drug discovery. Due to the difficulty of obtaining molecular property labels, pre-training models based on self-supervised learning has become increasingly popular in molecular representation learning. Notably, Graph Neural Networks (GNN) are employed as the backbones to encode implicit representations of molecules in most existing works. However, vanilla GNN encoders ignore chemical structural information and functions implied in molecular motifs, and obtaining the graph-level representation via the READOUT function hinders the interaction of graph and node representations. In this paper, we propose Hierarchical Molecular Graph Self-supervised Learning (HiMol), which introduces a pre-training framework to learn molecule representation for property prediction. First, we present a Hierarchical Molecular Graph Neural Network (HMGNN), which encodes motif structure and extracts node-motif-graph hierarchical molecular representations. Then, we introduce Multi-level Self-supervised Pre-training (MSP), in which corresponding multi-level generative and predictive tasks are designed as self-supervised signals of HiMol model. Finally, superior molecular property prediction results on both classification and regression tasks demonstrate the effectiveness of HiMol. Moreover, the visualization performance in the downstream dataset shows that the molecule representations learned by HiMol can capture chemical semantic information and properties.
Collapse
|