1
|
Wang L, Liu Y, Fu X, Ye X, Shi J, Yen GG, Zou Q, Zeng X, Cao D. HMAMP: Designing Highly Potent Antimicrobial Peptides Using a Hypervolume-Driven Multiobjective Deep Generative Model. J Med Chem 2025; 68:8346-8360. [PMID: 40232176 DOI: 10.1021/acs.jmedchem.4c03073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Antimicrobial peptides (AMPs) have exhibited unprecedented potential as biomaterials in combating multidrug-resistant bacteria, prompting the proposal of many excellent generative models. However, the multiobjective nature of AMP discovery is often overlooked, contributing to the high attrition rate of drug candidates. Here, we propose a novel approach termed hypervolume-driven multiobjective AMP design (HMAMP), which prioritizes the simultaneous optimization of multiattribute AMPs. By synergizing reinforcement learning and a gradient descent algorithm rooted in the hypervolume maximization concept, HMAMP effectively biases generative processes and mitigates the pattern collapse issue. Comparative experiments show that HMAMP significantly outperforms state-of-the-art methods in effectiveness and diversity. A knee-based decision strategy is then employed to fast screen candidates with favorable physicochemical properties, aligning with the enhanced antimicrobial activity and reduced side effects. Molecular visualization further elucidates structural and functional properties of the AMPs. Overall, HMAMP is an effective approach to traverse large and complex exploration spaces to search for idealism-realism trade-off AMPs.
Collapse
Affiliation(s)
- Li Wang
- College of Computer Science and Electronic Engineering, Hunan University, ChangSha 410082, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, ChangSha 410082, China
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong 999077, China
| | - Xiucai Ye
- System Information and Engineering, University of Tsukuba, Tsukuba 305-8571, Japan
| | - Junfeng Shi
- Interdisciplinary Life Sciences, Hunan University, ChangSha 410082, China
| | - Gary G Yen
- Electrical and Computer Engineering, Oklahoma State University, Stillwater, Oklahoma 74078, United States
| | - Quan Zou
- Basic and Frontier Research Institute, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, ChangSha 410082, China
| | - Dongsheng Cao
- Xiangya School of Pharmacy, Central South University, Changsha 410083, China
| |
Collapse
|
2
|
Zou Y, Guo T, Fu Z, Guo Z, Bo W, Yan D, Wang Q, Zeng J, Xu D, Wang T, Chen L. A structure-based framework for selective inhibitor design and optimization. Commun Biol 2025; 8:422. [PMID: 40075154 PMCID: PMC11903766 DOI: 10.1038/s42003-025-07840-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/27/2025] [Indexed: 03/14/2025] Open
Abstract
Structure-based drug design aims to create active compounds with favorable properties by analyzing target structures. Recently, deep generative models have facilitated structure-specific molecular generation. However, many methods are limited by inadequate pharmaceutical data, resulting in suboptimal molecular properties and unstable conformations. Additionally, these approaches often overlook binding pocket interactions and struggle with selective inhibitor design. To address these challenges, we developed a framework called Coarse-grained and Multi-dimensional Data-driven molecular generation (CMD-GEN). CMD-GEN bridges ligand-protein complexes with drug-like molecules by utilizing coarse-grained pharmacophore points sampled from diffusion model, enriching training data. Through a hierarchical architecture, it decomposes three-dimensional molecule generation within the pocket into pharmacophore point sampling, chemical structure generation, and conformation alignment, mitigating instability issues. CMD-GEN outperforms other methods in benchmark tests and controls drug-likeness effectively. Furthermore, CMD-GEN excels in cases across three synthetic lethal targets, and wet-lab validation with PARP1/2 inhibitors confirms its potential in selective inhibitor design.
Collapse
Affiliation(s)
- Yurong Zou
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Tao Guo
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Zhiyuan Fu
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Zhongning Guo
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Weichen Bo
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Dengjie Yan
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, West China School of Pharmacy, Sichuan University, Chengdu, China
| | - Qiantao Wang
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, West China School of Pharmacy, Sichuan University, Chengdu, China
| | - Jun Zeng
- Western Health, Faculty of Medicine Dentistry and Health Sciences, University of Melbourne, Carlton, VIC, Australia
| | - Dingguo Xu
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, China
| | - Taijin Wang
- Chengdu Zenitar Biomedical Technology Co., Ltd., Chengdu, China.
| | - Lijuan Chen
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China.
- Chengdu Zenitar Biomedical Technology Co., Ltd., Chengdu, China.
| |
Collapse
|
3
|
Kyro GW, Martin MT, Watt ED, Batista VS. CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability. J Cheminform 2025; 17:30. [PMID: 40045386 PMCID: PMC11881490 DOI: 10.1186/s13321-025-00976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 02/21/2025] [Indexed: 03/09/2025] Open
Abstract
The link between in vitro hERG ion channel inhibition and subsequent in vivo QT interval prolongation, a critical risk factor for the development of arrythmias such as Torsade de Pointes, is so well established that in vitro hERG activity alone is often sufficient to end the development of an otherwise promising drug candidate. It is therefore of tremendous interest to develop advanced methods for identifying hERG-active compounds in the early stages of drug development, as well as for proposing redesigned compounds with reduced hERG liability and preserved primary pharmacology. In this work, we present CardioGenAI, a machine learning-based framework for re-engineering both developmental and commercially available drugs for reduced hERG activity while preserving their pharmacological activity. The framework incorporates novel state-of-the-art discriminative models for predicting hERG channel activity, as well as activity against the voltage-gated NaV1.5 and CaV1.2 channels due to their potential implications in modulating the arrhythmogenic potential induced by hERG channel blockade. We applied the complete framework to pimozide, an FDA-approved antipsychotic agent that demonstrates high affinity to the hERG channel, and generated 100 refined candidates. Remarkably, among the candidates is fluspirilene, a compound which is of the same class of drugs as pimozide (diphenylmethanes) and therefore has similar pharmacological activity, yet exhibits over 700-fold weaker binding to hERG. Furthermore, we demonstrated the framework's ability to optimize hERG, NaV1.5 and CaV1.2 profiles of multiple FDA-approved compounds while maintaining the physicochemical nature of the original drugs. We envision that this method can effectively be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug development programs that have stalled due to hERG-related safety concerns. Additionally, the discriminative models can also serve independently as effective components of virtual screening pipelines. We have made all of our software open-source at https://github.com/gregory-kyro/CardioGenAI to facilitate integration of the CardioGenAI framework for molecular hypothesis generation into drug discovery workflows.Scientific contributionThis work introduces CardioGenAI, an open-source machine learning-based framework designed to re-engineer drugs for reduced hERG liability while preserving their pharmacological activity. The complete CardioGenAI framework can be applied to developmental compounds exhibiting hERG liabilities to provide a means of rescuing drug discovery programs facing hERG-related challenges. In addition, the framework incorporates novel state-of-the-art discriminative models for predicting hERG, NaV1.5 and CaV1.2 channel activity, which can function independently as effective components of virtual screening pipelines.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, CT, 06511, USA.
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA.
| | - Matthew T Martin
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA
| | - Eric D Watt
- Drug Safety Research & Development, Pfizer Research & Development, Groton, CT, 06340, USA
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
4
|
Creanza TM, Alberga D, Patruno C, Mangiatordi GF, Ancona N. Transformer Decoder Learns from a Pretrained Protein Language Model to Generate Ligands with High Affinity. J Chem Inf Model 2025; 65:1258-1277. [PMID: 39871540 DOI: 10.1021/acs.jcim.4c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2025]
Abstract
The drug discovery process can be significantly accelerated by using deep learning methods to suggest molecules with druglike features and, more importantly, that are good candidates to bind specific proteins of interest. We present a novel deep learning generative model, Prot2Drug, that learns to generate ligands binding specific targets leveraging (i) the information carried by a pretrained protein language model and (ii) the ability of transformers to capitalize the knowledge gathered from thousands of protein-ligand interactions. The embedding unveils the receipt to follow for designing molecules binding a given protein, and Prot2Drug translates such instructions by using the syntax of the molecular language generating novel compounds which are predicted to have favorable physicochemical properties and high affinity toward specific targets. Moreover, Prot2Drug reproduced numerous known interactions between compounds and proteins used for generating them and suggested novel protein targets for known compounds, indicating potential drug repurposing strategies. Remarkably, Prot2Drug facilitates the design of promising ligands even for protein targets with limited or no information about their ligands or 3D structure.
Collapse
Affiliation(s)
- Teresa Maria Creanza
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| | - Domenico Alberga
- Institute of Crystallography, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| | - Cosimo Patruno
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| | | | - Nicola Ancona
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| |
Collapse
|
5
|
Rajagopal N, Choudhary U, Tsang K, Martin KP, Karadag M, Chen HT, Kwon NY, Mozdzierz J, Horspool AM, Li L, Tessier PM, Marlow MS, Nixon AE, Kumar S. Deep learning-based design and experimental validation of a medicine-like human antibody library. Brief Bioinform 2024; 26:bbaf023. [PMID: 39851074 PMCID: PMC11757908 DOI: 10.1093/bib/bbaf023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 12/31/2024] [Accepted: 01/09/2025] [Indexed: 01/25/2025] Open
Abstract
Antibody generation requires the use of one or more time-consuming methods, namely animal immunization, and in vitro display technologies. However, the recent availability of large amounts of antibody sequence and structural data in the public domain along with the advent of generative deep learning algorithms raises the possibility of computationally generating novel antibody sequences with desirable developability attributes. Here, we describe a deep learning model for computationally generating libraries of highly human antibody variable regions whose intrinsic physicochemical properties resemble those of the variable regions of the marketed antibody-based biotherapeutics (medicine-likeness). We generated 100000 variable region sequences of antigen-agnostic human antibodies belonging to the IGHV3-IGKV1 germline pair using a training dataset of 31416 human antibodies that satisfied our computational developability criteria. The in-silico generated antibodies recapitulate intrinsic sequence, structural, and physicochemical properties of the training antibodies, and compare favorably with the experimentally measured biophysical attributes of 100 variable regions of marketed and clinical stage antibody-based biotherapeutics. A sample of 51 highly diverse in-silico generated antibodies with >90th percentile medicine-likeness and > 90% humanness was evaluated by two independent experimental laboratories. Our data show the in-silico generated sequences exhibit high expression, monomer content, and thermal stability along with low hydrophobicity, self-association, and non-specific binding when produced as full-length monoclonal antibodies. The ability to computationally generate developable human antibody libraries is a first step towards enabling in-silico discovery of antibody-based biotherapeutics. These findings are expected to accelerate in-silico discovery of antibody-based biotherapeutics and expand the druggable antigen space to include targets refractory to conventional antibody discovery methods requiring in vitro antigen production.
Collapse
Affiliation(s)
- Nandhini Rajagopal
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Udit Choudhary
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Kenny Tsang
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Kyle P Martin
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Murat Karadag
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Hsin-Ting Chen
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Na-Young Kwon
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Joseph Mozdzierz
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Alexander M Horspool
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Li Li
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Peter M Tessier
- Departments of Chemical Engineering, Pharmaceutical Sciences and Biomedical Engineering, Biointerfaces Institute, University of Michigan, 2800 Plymouth Road, Ann Arbor, MI 48105, United States
| | - Michael S Marlow
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Andrew E Nixon
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| | - Sandeep Kumar
- Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States
| |
Collapse
|
6
|
Flores-Hernandez H, Martinez-Ledesma E. A systematic review of deep learning chemical language models in recent era. J Cheminform 2024; 16:129. [PMID: 39558376 PMCID: PMC11571686 DOI: 10.1186/s13321-024-00916-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data, greatly reducing the resources and time involved. Various deep learning approaches have been developed to date, using a variety of architectures and strategies, in order to explore the extensive and discontinuous chemical space, providing benefits for generating compounds with specific properties. In this study, we present a systematic review that offers a statistical description and comparison of the strategies utilized to generate molecules through deep learning techniques, utilizing the metrics proposed in Molecular Sets (MOSES) or Guacamol. The study included 48 articles retrieved from a query-based search of Scopus and Web of Science and 25 articles retrieved from citation search, yielding a total of 72 retrieved articles, of which 62 correspond to chemical language models approaches to molecule generation and other 10 retrieved articles correspond to molecular graph representations. Transformers, recurrent neural networks (RNNs), generative adversarial networks (GANs), Structured Space State Sequence (S4) models, and variational autoencoders (VAEs) are considered the main deep learning architectures used for molecule generation in the set of retrieved articles. In addition, transfer learning, reinforcement learning, and conditional learning are the most employed techniques for biased model generation and exploration of specific chemical space regions. Finally, this analysis focuses on the central themes of molecular representation, databases, training dataset size, validity-novelty trade-off, and performance of unbiased and biased chemical language models. These themes were selected to conduct a statistical analysis utilizing graphical representation and statistical tests. The resulting analysis reveals the main challenges, advantages, and opportunities in the field of chemical language models over the past four years.
Collapse
Affiliation(s)
- Hector Flores-Hernandez
- Tecnológico de Monterrey, School of Engineering and Sciences, Monterrey, 64710, Nuevo León, México
| | - Emmanuel Martinez-Ledesma
- Tecnológico de Monterrey, School of Medicine and Health Sciences, Monterrey, 64710, Nuevo León, México.
- Institute for Obesity Research, Tecnológico de Monterrey, Monterrey, 64710, Nuevo León, México.
| |
Collapse
|
7
|
Xu C, Zheng L, Fan Q, Liu Y, Zeng C, Ning X, Liu H, Du K, Lu T, Chen Y, Zhang Y. Progress in the application of artificial intelligence in molecular generation models based on protein structure. Eur J Med Chem 2024; 277:116735. [PMID: 39098131 DOI: 10.1016/j.ejmech.2024.116735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/12/2024] [Accepted: 07/30/2024] [Indexed: 08/06/2024]
Abstract
The molecular generation models based on protein structures represent a cutting-edge research direction in artificial intelligence-assisted drug discovery. This article aims to comprehensively summarize the research methods and developments by analyzing a series of novel molecular generation models predicated on protein structures. Initially, we categorize the molecular generation models based on protein structures and highlight the architectural frameworks utilized in these models. Subsequently, we detail the design and implementation of protein structure-based molecular generation models by introducing different specific examples. Lastly, we outline the current opportunities and challenges encountered in this field, intending to offer guidance and a referential framework for developing and studying new models in related fields in the future.
Collapse
Affiliation(s)
- Chengcheng Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lidan Zheng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Qing Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yingxu Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Chen Zeng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xiangzhen Ning
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Ke Du
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China; State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
8
|
Zhang X, Zhao S, Su X, Xu L. From docking to dynamics: Unveiling the potential non-peptide and non-covalent inhibitors of M pro from natural products. Comput Biol Med 2024; 181:108963. [PMID: 39216402 DOI: 10.1016/j.compbiomed.2024.108963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/05/2024] [Accepted: 07/26/2024] [Indexed: 09/04/2024]
Abstract
MOTIVATION This study aims to investigate non-covalent and non-peptide inhibitors of Mpro, a crucial protein target, by employing a comprehensive approach that integrates molecular docking, molecular dynamics simulations, and top-hits activity predictions. The focus is on elucidating the non-covalent and non-peptide binding modes of potential inhibitors with Mpro. METHODS We employed a semi-flexible molecular docking methodology, binding score and ADME screening, which are based on structure, to screen compounds from CMNPD and HERB in silico. These methodologies allowed us to find potential candidates depending on their binding values and interactions with the binding site of main protease. To further evaluate the stability of these interactions, we conducted molecular dynamics simulations and calculated binding energies. Ultimately, a top-hits activity prediction method was employed to prioritize compounds based on their predicted inhibitory potential. RESULTS Through a combination of binding energy calculations and activity predictions, we identified six potential inhibitor molecules exhibiting promising activity against Mpro. These compounds demonstrated favorable binding interactions and stability profiles, making them attractive candidates for further experimental validation and drug development efforts targeting Mpro.
Collapse
Affiliation(s)
- Xin Zhang
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Shulin Zhao
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan, China
| | - Lifeng Xu
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, China.
| |
Collapse
|
9
|
Hu X, Liu G, Yao Q, Zhao Y, Zhang H. Hamiltonian diversity: effectively measuring molecular diversity by shortest Hamiltonian circuits. J Cheminform 2024; 16:94. [PMID: 39113120 PMCID: PMC11308660 DOI: 10.1186/s13321-024-00883-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/11/2024] [Indexed: 08/10/2024] Open
Abstract
In recent years, significant advancements have been made in molecular generation algorithms aimed at facilitating drug development, and molecular diversity holds paramount importance within the realm of molecular generation. Nonetheless, the effective quantification of molecular diversity remains an elusive challenge, as extant metrics exemplified by Richness and Internal Diversity fall short in concurrently encapsulating the two main aspects of such diversity: quantity and dissimilarity. To address this quandary, we propose Hamiltonian diversity, a novel molecular diversity metric predicated upon the shortest Hamiltonian circuit. This metric embodies both aspects of molecular diversity in principle, and we implement its calculation with high efficiency and accuracy. Furthermore, through empirical experiments we demonstrate the high consistency of Hamiltonian diversity with real-world chemical diversity, and substantiate its effects in promoting diversity of molecular generation algorithms. Our implementation of Hamiltonian diversity in Python is available at: https://github.com/HXYfighter/HamDiv .Scientific contributionWe propose a more rational molecular diversity metric for the community of cheminformatics and drug development. This metric can be applied to evaluation of existing molecular generation methods and enhancing drug design algorithms.
Collapse
Affiliation(s)
- Xiuyuan Hu
- Department of Electronic Engineering, Tsinghua University, Beijing, China
- Microsoft Research AI for Science, Beijing, China
| | - Guoqing Liu
- Microsoft Research AI for Science, Beijing, China
| | - Quanming Yao
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Yang Zhao
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Hao Zhang
- Department of Electronic Engineering, Tsinghua University, Beijing, China.
| |
Collapse
|
10
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
11
|
Cheng N, Wang L, Liu Y, Song B, Ding C. HANSynergy: Heterogeneous Graph Attention Network for Drug Synergy Prediction. J Chem Inf Model 2024; 64:4334-4347. [PMID: 38709204 PMCID: PMC11135324 DOI: 10.1021/acs.jcim.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
Drug synergy therapy is a promising strategy for cancer treatment. However, the extensive variety of available drugs and the time-intensive process of determining effective drug combinations through clinical trials pose significant challenges. It requires a reliable method for the rapid and precise selection of drug synergies. In response, various computational strategies have been developed for predicting drug synergies, yet the exploitation of heterogeneous biological network features remains underexplored. In this study, we construct a heterogeneous graph that encompasses diverse biological entities and interactions, utilizing rich data sets from sources, such as DrugCombDB, PubChem, UniProt, and cancer cell line encyclopedia (CCLE). We initialize node feature representations and introduce a novel virtual node to enhance drug representation. Our proposed method, the heterogeneous graph attention network for drug-drug synergy prediction (HANSynergy), has been experimentally validated to demonstrate that the heterogeneous graph attention network can extract key node features, efficiently harness the diversity of information, and further enhance network functionality through the incorporation of a multihead attention mechanism. In the comparative experiment, the highest accuracy (Acc) and area under the curve (AUC) are 0.877 and 0.947, respectively, in DrugCombDB_early data set, demonstrating the superiority of HANSynergy over the competing methods. Moreover, protein-protein interactions are important in understanding the mechanism of action of drugs. The heterogeneous attention mechanism facilitates protein-protein interaction analysis. By analyzing the changes of attention weight before and after heterogeneous network training, we investigated proteins that may be associated with drug combinations. Additionally, case studies align our findings with existing research, underscoring the potential of HANSynergy in drug synergy prediction. This advancement not only contributes to the burgeoning field of drug synergy prediction but also holds the potential to provide valuable insights and uncover new drug synergies for combating cancer.
Collapse
Affiliation(s)
- Ning Cheng
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
| | - Li Wang
- Degree
Programs in Systems and information Engineering, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Yiping Liu
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bosheng Song
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Changsong Ding
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
- Big
Data Analysis Laboratory of Traditional Chinese Medicine, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
| |
Collapse
|
12
|
Jiao S, Ye X, Sakurai T, Zou Q, Liu R. Integrated convolution and self-attention for improving peptide toxicity prediction. Bioinformatics 2024; 40:btae297. [PMID: 38696758 PMCID: PMC11654579 DOI: 10.1093/bioinformatics/btae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/02/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.
Collapse
Affiliation(s)
- Shihu Jiao
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic
Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science
and Technology of China, Quzhou 324000, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing 100191,
China
| |
Collapse
|
13
|
Nandi S, Bhaduri S, Das D, Ghosh P, Mandal M, Mitra P. Deciphering the Lexicon of Protein Targets: A Review on Multifaceted Drug Discovery in the Era of Artificial Intelligence. Mol Pharm 2024; 21:1563-1590. [PMID: 38466810 DOI: 10.1021/acs.molpharmaceut.3c01161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Understanding protein sequence and structure is essential for understanding protein-protein interactions (PPIs), which are essential for many biological processes and diseases. Targeting protein binding hot spots, which regulate signaling and growth, with rational drug design is promising. Rational drug design uses structural data and computational tools to study protein binding sites and protein interfaces to design inhibitors that can change these interactions, thereby potentially leading to therapeutic approaches. Artificial intelligence (AI), such as machine learning (ML) and deep learning (DL), has advanced drug discovery and design by providing computational resources and methods. Quantum chemistry is essential for drug reactivity, toxicology, drug screening, and quantitative structure-activity relationship (QSAR) properties. This review discusses the methodologies and challenges of identifying and characterizing hot spots and binding sites. It also explores the strategies and applications of artificial-intelligence-based rational drug design technologies that target proteins and protein-protein interaction (PPI) binding hot spots. It provides valuable insights for drug design with therapeutic implications. We have also demonstrated the pathological conditions of heat shock protein 27 (HSP27) and matrix metallopoproteinases (MMP2 and MMP9) and designed inhibitors of these proteins using the drug discovery paradigm in a case study on the discovery of drug molecules for cancer treatment. Additionally, the implications of benzothiazole derivatives for anticancer drug design and discovery are deliberated.
Collapse
Affiliation(s)
- Suvendu Nandi
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Soumyadeep Bhaduri
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Debraj Das
- Centre for Computational and Data Sciences, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Priya Ghosh
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Mahitosh Mandal
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India
| |
Collapse
|
14
|
Zhang ZY, Zhang Z, Ye X, Sakurai T, Lin H. A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens. Int J Biol Macromol 2024; 265:130659. [PMID: 38462114 DOI: 10.1016/j.ijbiomac.2024.130659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/19/2024] [Accepted: 03/04/2024] [Indexed: 03/12/2024]
Abstract
Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Hao Lin
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
15
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
16
|
García-Sosa AT. Benford's Law and distributions for better drug design. Expert Opin Drug Discov 2024; 19:131-137. [PMID: 37921672 DOI: 10.1080/17460441.2023.2277342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 10/26/2023] [Indexed: 11/04/2023]
Abstract
INTRODUCTION Modern drug discovery incorporates various tools and data, heralding the beginning of the data-driven drug design (DD) era. The distributions of chemical and physical data used for Artificial Intelligence (AI)/Machine Learning (ML) and to drive DD have thus become highly important to be understood and used effectively. AREAS COVERED The authors perform a comprehensive exploration of the statistical distributions driving the data-intensive era of drug discovery, including Benford's Law in AI/ML-based DD. EXPERT OPINION As the relevance of data-driven discovery escalates, we anticipate meticulous scrutiny of datasets utilizing principles like Benford's Law to enhance data integrity and guide efficient resource allocation and experimental planning. In this data-driven era of the pharmaceutical and medical industries, addressing critical aspects such as bias mitigation, algorithm effectiveness, data stewardship, effects, and fraud prevention are essential. Harnessing Benford's Law and other distributions and statistical tests in DD provides a potent strategy to detect data anomalies, fill data gaps, and enhance dataset quality. Benford's Law is a fast method for data integrity and quality of datasets, the backbone of AI/ML and other modeling approaches, proving very useful in the design process.
Collapse
Affiliation(s)
- Alfonso T García-Sosa
- Chair of Molecular Technology, Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
17
|
Lee J, Jun DW, Song I, Kim Y. DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning. J Cheminform 2024; 16:14. [PMID: 38297330 PMCID: PMC10832108 DOI: 10.1186/s13321-024-00808-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open
Abstract
The drug discovery process is demanding and time-consuming, and machine learning-based research is increasingly proposed to enhance efficiency. A significant challenge in this field is predicting whether a drug molecule's structure will interact with a target protein. A recent study attempted to address this challenge by utilizing an encoder that leverages prior knowledge of molecular and protein structures, resulting in notable improvements in the prediction performance of the drug-target interactions task. Nonetheless, the target encoders employed in previous studies exhibit computational complexity that increases quadratically with the input length, thereby limiting their practical utility. To overcome this challenge, we adopt a hint-based learning strategy to develop a compact and efficient target encoder. With the adaptation parameter, our model can blend general knowledge and target-oriented knowledge to build features of the protein sequences. This approach yielded considerable performance enhancements and improved learning efficiency on three benchmark datasets: BIOSNAP, DAVIS, and Binding DB. Furthermore, our methodology boasts the merit of necessitating only a minimal Video RAM (VRAM) allocation, specifically 7.7GB, during the training phase (16.24% of the previous state-of-the-art model). This ensures the feasibility of training and inference even with constrained computational resources.
Collapse
Affiliation(s)
- Jonghyun Lee
- Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
| | - Dae Won Jun
- Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
- Department of Internal Medicine, Hanyang University College of Medicine, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
| | - Ildae Song
- Department of Pharmaceutical Science and Technology, Kyungsung University, 309, Suyeong-ro, Nam-gu, Busan, 48434, Korea
| | - Yun Kim
- College of Pharmacy, Deagu Catholic University, 13-13, Hayang-ro, Hayang-eup, Gyeongsan-si, 38430, Gyeongsangbuk-do, Korea.
| |
Collapse
|
18
|
Nowak D, Huczyński A, Bachorz RA, Hoffmann M. Machine Learning Application for Medicinal Chemistry: Colchicine Case, New Structures, and Anticancer Activity Prediction. Pharmaceuticals (Basel) 2024; 17:173. [PMID: 38399388 PMCID: PMC10892630 DOI: 10.3390/ph17020173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 01/02/2024] [Accepted: 01/12/2024] [Indexed: 02/25/2024] Open
Abstract
In the contemporary era, the exploration of machine learning (ML) has gained widespread attention and is being leveraged to augment traditional methodologies in quantitative structure-activity relationship (QSAR) investigations. The principal objective of this research was to assess the anticancer potential of colchicine-based compounds across five distinct cell lines. This research endeavor ultimately sought to construct ML models proficient in forecasting anticancer activity as quantified by the IC50 value, while concurrently generating innovative colchicine-derived compounds. The resistance index (RI) is computed to evaluate the drug resistance exhibited by LoVo/DX cells relative to LoVo cancer cell lines. Meanwhile, the selectivity index (SI) is computed to determine the potential of a compound to demonstrate superior efficacy against tumor cells compared to its toxicity against normal cells, such as BALB/3T3. We introduce a novel ML system adept at recommending novel chemical structures predicated on known anticancer activity. Our investigation entailed the assessment of inhibitory capabilities across five cell lines, employing predictive models utilizing various algorithms, including random forest, decision tree, support vector machines, k-nearest neighbors, and multiple linear regression. The most proficient model, as determined by quality metrics, was employed to predict the anticancer activity of novel colchicine-based compounds. This methodological approach yielded the establishment of a library encompassing new colchicine-based compounds, each assigned an IC50 value. Additionally, this study resulted in the development of a validated predictive model, capable of reasonably estimating IC50 values based on molecular structure input.
Collapse
Affiliation(s)
- Damian Nowak
- Department of Quantum Chemistry, Faculty of Chemistry, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 8, 61-614 Poznan, Poland
| | - Adam Huczyński
- Department of Medical Chemistry, Faculty of Chemistry, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 8, 61-614 Poznan, Poland;
| | - Rafał Adam Bachorz
- Institute of Medical Biology of Polish Academy of Sciences, Lodowa 106, 93-232 Lodz, Poland;
- Institute of Computing Science, Faculty of Computing, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland
| | - Marcin Hoffmann
- Department of Quantum Chemistry, Faculty of Chemistry, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 8, 61-614 Poznan, Poland
| |
Collapse
|
19
|
He S, Ye X, Dou L, Sakurai T. FIAMol-AB: A feature fusion and attention-based deep learning method for enhanced antibiotic discovery. Comput Biol Med 2024; 168:107762. [PMID: 38056212 DOI: 10.1016/j.compbiomed.2023.107762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 10/31/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023]
Abstract
Antibiotic resistance continues to be a growing concern for global health, accentuating the need for novel antibiotic discoveries. Traditional methodologies in this field have relied heavily on extensive experimental screening, which is often time-consuming and costly. Contrastly, computer-assisted drug screening offers rapid, cost-effective solutions. In this work, we propose FIAMol-AB, a deep learning model that combines graph neural networks, text convolutional networks and molecular fingerprint techniques. This method also combines an attention mechanism to fuse multiple forms of information within the model. The experiments show that FIAMol-AB may offer potential advantages in antibiotic discovery tasks over some existing methods. We conducted some analysis based on our model's results, which help highlight the potential significance of certain features in the model's predictive performance. Compared to different models, ours demonstrate promising results, indicating potential robustness and versatility. This suggests that by integrating multi-view information and attention mechanisms, FIAMol-AB might better learn complex molecular structures, potentially improving the precision and efficiency of antibiotic discovery. We hope our FIAMol-AB can be used as a useful method in the ongoing fight against antibiotic resistance.
Collapse
Affiliation(s)
- Shida He
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan.
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH, 44106, USA
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan
| |
Collapse
|
20
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. ARXIV 2023:arXiv:2309.05853v2. [PMID: 37744464 PMCID: PMC10516108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
|
21
|
Ru X, Zou Q, Lin C. Optimization of drug-target affinity prediction methods through feature processing schemes. Bioinformatics 2023; 39:btad615. [PMID: 37812388 PMCID: PMC10636279 DOI: 10.1093/bioinformatics/btad615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/19/2023] [Accepted: 10/07/2023] [Indexed: 10/10/2023] Open
Abstract
MOTIVATION Numerous high-accuracy drug-target affinity (DTA) prediction models, whose performance is heavily reliant on the drug and target feature information, are developed at the expense of complexity and interpretability. Feature extraction and optimization constitute a critical step that significantly influences the enhancement of model performance, robustness, and interpretability. Many existing studies aim to comprehensively characterize drugs and targets by extracting features from multiple perspectives; however, this approach has drawbacks: (i) an abundance of redundant or noisy features; and (ii) the feature sets often suffer from high dimensionality. RESULTS In this study, to obtain a model with high accuracy and strong interpretability, we utilize various traditional and cutting-edge feature selection and dimensionality reduction techniques to process self-associated features and adjacent associated features. These optimized features are then fed into learning to rank to achieve efficient DTA prediction. Extensive experimental results on two commonly used datasets indicate that, among various feature optimization methods, the regression tree-based feature selection method is most beneficial for constructing models with good performance and strong robustness. Then, by utilizing Shapley Additive Explanations values and the incremental feature selection approach, we obtain that the high-quality feature subset consists of the top 150D features and the top 20D features have a breakthrough impact on the DTA prediction. In conclusion, our study thoroughly validates the importance of feature optimization in DTA prediction and serves as inspiration for constructing high-performance and high-interpretable models. AVAILABILITY AND IMPLEMENTATION https://github.com/RUXIAOQING964914140/FS_DTA.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chen Lin
- Department of Computer Science and Technology, School of Informatics, Xiamen University, Xiamen, Fujian, 361005, China
| |
Collapse
|
22
|
Haroon S, C A H, A S J. Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design. Comput Biol Chem 2023; 106:107911. [PMID: 37450999 DOI: 10.1016/j.compbiolchem.2023.107911] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 06/24/2023] [Accepted: 06/28/2023] [Indexed: 07/18/2023]
Abstract
De novo drug design refers to the process of designing new drug molecules from scratch using computational methods. In contrast to other computational methods that primarily focus on modifying existing molecules, designing from scratch enables the exploration of new chemical space and the potential discovery of novel molecules with enhanced properties. In this research, we proposed a model that utilizes Generative Pre-trained Transformer (GPT) architecture and relative attention for de novo drug design. GPT is a language model that utilizes transformer architecture to predict the next word or token in a given sequence. Representation of molecules using SMILES notation has enabled the use of next-token prediction techniques in de novo drug design. GPT uses attention mechanisms to capture the dependencies and relationships between different tokens in a sequence and allows the model to focus on the most important information when processing the input. Relative attention is a variant of the attention mechanism, which allows the model to capture the relative distances and relationships between tokens in the input sequence. In the standard attention mechanism, positional information is typically encoded using fixed-position embeddings. In relative attention, positional information is supplied dynamically during attention calculation by incorporating relative positional encodings, enabling the model to quickly learn the syntax of new unseen tokens. Relative attention enables the GPT model to better understand the relative positions of tokens in the sequence, which can be particularly useful when dealing with limited dataset sizes or generating target-specific drugs. The proposed model was trained on benchmark datasets, and performance was compared with other generative models. We show that relative attention and transfer learning could enable the GPT model to generate molecules with improved validity, uniqueness, and novelty in the context of de novo drug design. To illustrate the effectiveness of relative attention, the model was trained using transfer learning on three target-specific datasets, and the performance was compared with standard attention.
Collapse
Affiliation(s)
- Suhail Haroon
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India.
| | - Hafsath C A
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India
| | - Jereesh A S
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India.
| |
Collapse
|