1
|
Tomme L, Ureel Y, Dobbelaere MR, Lengyel I, Vermeire FH, Stevens CV, Van Geem KM. Machine learning applications for thermochemical and kinetic property prediction. REV CHEM ENG 2025; 41:419-449. [PMID: 40303423 PMCID: PMC12037204 DOI: 10.1515/revce-2024-0027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 10/07/2024] [Indexed: 05/02/2025]
Abstract
Detailed kinetic models play a crucial role in comprehending and enhancing chemical processes. A cornerstone of these models is accurate thermodynamic and kinetic properties, ensuring fundamental insights into the processes they describe. The prediction of these thermochemical and kinetic properties presents an opportunity for machine learning, given the challenges associated with their experimental or quantum chemical determination. This study reviews recent advancements in predicting thermochemical and kinetic properties for gas-phase, liquid-phase, and catalytic processes within kinetic modeling. We assess the state-of-the-art of machine learning in property prediction, focusing on three core aspects: data, representation, and model. Moreover, emphasis is placed on machine learning techniques to efficiently utilize available data, thereby enhancing model performance. Finally, we pinpoint the lack of high-quality data as a key obstacle in applying machine learning to detailed kinetic models. Accordingly, the generation of large new datasets and further development of data-efficient machine learning techniques are identified as pivotal steps in advancing machine learning's role in kinetic modeling.
Collapse
Affiliation(s)
- Lowie Tomme
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052Gent, Belgium
| | - Yannick Ureel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052Gent, Belgium
| | - Maarten R. Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052Gent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052Gent, Belgium
- ChemInsights LLC, Dover, DE19901, USA
| | - Florence H. Vermeire
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, 3001Leuven, Belgium
| | - Christian V. Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Ghent9000, Belgium
| | - Kevin M. Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052Gent, Belgium
| |
Collapse
|
2
|
Wiesner A, Zagrodzki P, Gawalska A, Marcinkowska M, Cios A, Paśko P. Navigating through chemometrics: Unveiling antibiotic-food interactions for improved pediatric formulations ahead. Eur J Pharm Biopharm 2025; 208:114652. [PMID: 39875059 DOI: 10.1016/j.ejpb.2025.114652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 01/17/2025] [Accepted: 01/24/2025] [Indexed: 01/30/2025]
Abstract
BACKGROUND Given the challenges of pediatric antibacterial therapy, it is crucial to formulate antibiotics with a lower potential for interaction with dietary interventions and tailor them for optimal administration in children. Chemometric methods allow us to analyze multiple interrelated variables simultaneously and uncover correlations. AIM We applied a chemometric approach to examine how food, beverages, antacids, and mineral supplements affect antibiotic bioavailability in adults and children, aiming to explore relationships between antibiotic structure, physicochemical properties, and post-meal changes in pharmacokinetic (PK) parameters. METHODS We selected 95 antibacterial drugs for analysis, including beta-lactams (32), quinolones (25), macrolides (13), tetracyclines (16), and others (9). The input dataset comprised information from published clinical trials, chemical records, and calculations. We constructed hierarchical partial least squares (PLS) models with changes in PK parameters (ΔAUC, ΔCmax, ΔTmax, and Δ t ½) as response parameters and nine groups of molecular descriptors (M1-M9) as predictor parameters. We performed analyses separately in children and adults for different dietary interventions. RESULTS In the final 10 PLS models, significant components explained 61-90% and 10.3-54.4% of the variance in the predictor and response parameter sets, respectively. We obtained 59 significant positive and negative correlations between antibiotic structure or physicochemical properties (molecular descriptors) and action in the human body in the presence of food, antacids, or mineral supplements (changes in PK parameters), of which 41 concern pediatric patients. CONCLUSIONS Chemometric methods can be helpful and valuable in investigating the interactions between antibiotics and dietary interventions. Using chemometrics may pave the way for formulating antibiotics for children with a lower potential to interact with food.
Collapse
Affiliation(s)
- Agnieszka Wiesner
- Doctoral School of Medical and Health Sciences Jagiellonian University Medical College Cracow Poland; Department of Food Chemistry and Nutrition Faculty of Pharmacy Jagiellonian University Medical College Cracow Poland
| | - Paweł Zagrodzki
- Department of Food Chemistry and Nutrition Faculty of Pharmacy Jagiellonian University Medical College Cracow Poland
| | - Alicja Gawalska
- Department of Medicinal Chemistry Faculty of Pharmacy Jagiellonian University Medical College Cracow Poland
| | - Monika Marcinkowska
- Department of Medicinal Chemistry Faculty of Pharmacy Jagiellonian University Medical College Cracow Poland
| | - Agnieszka Cios
- Department of Clinical Pharmacy Faculty of Pharmacy Jagiellonian University Medical College Cracow Poland
| | - Paweł Paśko
- Department of Food Chemistry and Nutrition Faculty of Pharmacy Jagiellonian University Medical College Cracow Poland.
| |
Collapse
|
3
|
Yan X, Feng B, Song H, Wang L, Wang Y, Sun Y, Cai X, Rong Y, Wang X, Wang Y. Identification and mechanistic study of piceatannol as a natural xanthine oxidase inhibitor. Int J Biol Macromol 2025; 293:139231. [PMID: 39732228 DOI: 10.1016/j.ijbiomac.2024.139231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 12/20/2024] [Accepted: 12/24/2024] [Indexed: 12/30/2024]
Abstract
Natural Xanthine oxidase (XOD) inhibitors represent promising therapeutic agents for hyperuricemia (HUA) treatment due to their potent efficacy and favorable safety profiles. This study involved the construction of a comprehensive database of 315 XOD inhibitors and development of 28 machine learning-based QSAR models. The ChemoPy light gradient boosting machine model exhibited the best performance (AUC = 0.9371 and MCC = 0.7423). This model identified three potential XOD inhibitors from the FooDB database: daphnetin, 7-hydroxycoumarin, and piceatannol. Molecular docking and dynamics simulations revealed favorable interactions, with piceatannol showing a remarkable stability through hydrogen bonding and hydrophobic interactions. ADME predictions suggested that all three compounds possess desirable drug-like properties and safety characteristics. Subsequent in vitro enzyme inhibition assays validated computational predictions, with piceatannol exhibiting the strongest inhibitory activity (IC50 = 8.80 ± 0.05 μM). Multispectroscopic analyses revealed that piceatannol-XOD binding was predominantly mediated by hydrogen bonding and van der Waals forces, which induced conformational changes characterized by decreased α-helical content and increased proportions of β-sheets, β-turns, and random coils. This study presents an efficient strategy for the identification of natural XOD inhibitors, elucidates the molecular mechanism of piceatannol-mediated XOD inhibition, and establishes a foundation for its therapeutic application in HUA treatment.
Collapse
Affiliation(s)
- Xinxu Yan
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China; Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji 831100, PR China
| | - Baolong Feng
- Center for Education Technology, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Hongjie Song
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Lili Wang
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Yehui Wang
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Yulin Sun
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Xiaoshuang Cai
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Yating Rong
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China
| | - Xibo Wang
- College of Food Science, Northeast Agricultural University, Harbin 150030, Heilongjiang, China.
| | - Yutang Wang
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China; Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji 831100, PR China.
| |
Collapse
|
4
|
Guan J, Dong D, Xie P, Zhao Z, Guo Y, Lee TY, Yao L, Chiang YC. StackDILI: Enhancing Drug-Induced Liver Injury Prediction through Stacking Strategy with Effective Molecular Representations. J Chem Inf Model 2025; 65:1027-1039. [PMID: 39786982 DOI: 10.1021/acs.jcim.4c02079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
Drug-induced liver injury (DILI) is a major challenge in drug development, often leading to clinical trial failures and market withdrawals due to liver toxicity. This study presents StackDILI, a computational framework designed to accelerate toxicity assessment by predicting DILI risk. StackDILI integrates multiple molecular descriptors to extract structural and physicochemical features, including the constitution, pharmacophore, MACCS, and E-state descriptors. Additionally, a genetic algorithm is employed for feature selection and optimization, ensuring that the most relevant features are used. These optimized features are processed through a stacking ensemble model comprising multiple tree-based machine learning models, improving prediction accuracy and interpretability. Notably, StackDILI demonstrates a strong performance on the DILIrank test set and maintains robustness across cross-validation. Moreover, interpretability analysis reveals key molecular features associated with DILI risks, providing valuable insights into toxicity prediction. To further improve accessibility, a user-friendly web interface is developed, allowing users to input SMILES strings and receive rapid predictions with ease. The StackDILI model provides a powerful tool for efficient DILI assessment, supporting safer drug development. The web interface is accessible at https://awi.cuhk.edu.cn/biosequence/StackDILI/.
Collapse
Affiliation(s)
- Jiahui Guan
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Danhong Dong
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Zhihao Zhao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Yilin Guo
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-Devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| |
Collapse
|
5
|
Rong Y, Feng B, Cai X, Song H, Wang L, Wang Y, Yan X, Sun Y, Zhao J, Li P, Yang H, Wang Y, Wang F. Predicting variable-length ACE inhibitory peptides based on graph convolutional network. Int J Biol Macromol 2024; 282:137060. [PMID: 39481706 DOI: 10.1016/j.ijbiomac.2024.137060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 10/07/2024] [Accepted: 10/28/2024] [Indexed: 11/02/2024]
Abstract
Traditional molecular descriptors have contributed to the prediction of angiotensin I-converting enzyme (ACE) inhibitory peptides, but they often fall short in capturing the complex structure of the molecule. To address these limitations, this study introduces molecular graphs as an advanced method for peptide characterization. Peptides containing 2-10 amino acids were represented using molecular graphs, and a graph convolutional network (GCN) model was constructed to predict variable-length peptides. This model was compared with machine learning (ML) models based on molecular descriptors, including Random Forest (RF), Support Vector Machine (SVM), and k-Nearest Neighbor (kNN), under the same benchmark. Notably, the GCN model outperformed the other models with an accuracy of 0.78, effectively identifying ACE inhibitory potential. Furthermore, the GCN model also demonstrated superior performance, exceeding existing methods with an accuracy rate of over 98 % on an independent test set. To validate our predictions, we synthesized peptides VAPE and AQQKEP with high predicted probabilities, and their IC50 values of 2.25 ± 0.11 and 3.75 ± 0.17 μM, respectively, indicating potent ACE inhibitory activity. The developed GCN model presents a powerful tool for the rapid screening and identification of ACE inhibitory peptides, offering promising opportunities for developing antihypertensive components in functional foods.
Collapse
Affiliation(s)
- Yating Rong
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China; Food College, Northeast Agricultural University, Harbin 150030, China
| | - Baolong Feng
- Center for Education Technology, Northeast Agricultural University, Harbin 150030, PR China.
| | - Xiaoshuang Cai
- Food College, Northeast Agricultural University, Harbin 150030, China
| | - Hongjie Song
- Food College, Northeast Agricultural University, Harbin 150030, China
| | - Lili Wang
- Food College, Northeast Agricultural University, Harbin 150030, China
| | - Yehui Wang
- Food College, Northeast Agricultural University, Harbin 150030, China
| | - Xinxu Yan
- Food College, Northeast Agricultural University, Harbin 150030, China
| | - Yulin Sun
- Food College, Northeast Agricultural University, Harbin 150030, China
| | - Jinyong Zhao
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China
| | - Ping Li
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China
| | - Huihui Yang
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China
| | - Yutang Wang
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China; Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji 831100, China.
| | - Fengzhong Wang
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, China.
| |
Collapse
|
6
|
Boczar D, Michalska K. A Review of Machine Learning and QSAR/QSPR Predictions for Complexes of Organic Molecules with Cyclodextrins. Molecules 2024; 29:3159. [PMID: 38999108 PMCID: PMC11243237 DOI: 10.3390/molecules29133159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 06/27/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024] Open
Abstract
Cyclodextrins are macrocyclic rings composed of glucose residues. Due to their remarkable structural properties, they can form host-guest inclusion complexes, which is why they are frequently used in the pharmaceutical, cosmetic, and food industries, as well as in environmental and analytical chemistry. This review presents the reports from 2011 to 2023 on the quantitative structure-activity/property relationship (QSAR/QSPR) approach, which is primarily employed to predict the thermodynamic stability of inclusion complexes. This article extensively discusses the significant developments related to the size of available experimental data, the available sets of descriptors, and the machine learning (ML) algorithms used, such as support vector machines, random forests, artificial neural networks, and gradient boosting. As QSAR/QPR analysis only requires molecular structures of guests and experimental values of stability constants, this approach may be particularly useful for predicting these values for complexes with randomly substituted cyclodextrins, as well as for estimating their dependence on pH. This work proposes solutions on how to effectively use this knowledge, which is especially important for researchers who will deal with this topic in the future. This review also presents other applications of ML in relation to CD complexes, including the prediction of physicochemical properties of CD complexes, the development of analytical methods based on complexation with CDs, and the optimisation of experimental conditions for the preparation of the complexes.
Collapse
Affiliation(s)
- Dariusz Boczar
- Department of Synthetic Drugs, National Medicines Institute, Chełmska 30/34, 00-725 Warsaw, Poland
| | - Katarzyna Michalska
- Department of Synthetic Drugs, National Medicines Institute, Chełmska 30/34, 00-725 Warsaw, Poland
| |
Collapse
|
7
|
Haider S, Shafiq M, Siddiqui AR, Sardar M, Mushtaq M, Shafeeq S, Nur-E-Alam M, Ahmad A, Ul-Haq Z. Uncovering PPAR-γ agonists: An integrated computational approach driven by machine learning. J Mol Graph Model 2024; 129:108742. [PMID: 38422823 DOI: 10.1016/j.jmgm.2024.108742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/16/2024] [Accepted: 02/16/2024] [Indexed: 03/02/2024]
Abstract
Peroxisome proliferator-activated receptor gamma (PPAR-γ) serves as a nuclear receptor with a pivotal function in governing diverse facets of metabolic processes. In diabetes, the prime physiological role of PPAR-γ is to enhance insulin sensitivity and regulate glucose metabolism. Although PPAR-γ agonists such as Thiazolidinediones are effective in addressing diabetes complications, it is vital to be mindful that they are associated with substantial side effects that could potentially give rise to health challenges. The recent surge in the discovery of selective modulators of PPAR-γ inspired us to formulate an integrated computational strategy by leveraging the promising capabilities of both machine learning and in silico drug design approaches. In pursuit of our objectives, the initial stage of our work involved constructing an advanced machine learning classification model, which was trained utilizing chemical information and physicochemical descriptors obtained from known PPAR-γ modulators. The subsequent application of machine learning-based virtual screening, using a library of 31,750 compounds, allowed us to identify 68 compounds having suitable characteristics for further investigation. A total of four compounds were identified and the most favorable configurations were complemented with docking scores ranging from -8.0 to -9.1 kcal/mol. Additionally, the compounds engaged in hydrogen bond interactions with essential conserved residues including His323, Leu330, Phe363, His449 and Tyr473 that describe the ligand binding site. The stability indices investigated herein for instance root-mean-square fluctuations in the backbone atoms indicated higher mobility in the region of orthosteric site in the presence of agonist with the deviation peaks in the range of 0.07-0.69 nm, signifying moderate conformational changes. The deviations at global level revealed that the average values lie in the range of 0.25-0.32 nm. In conclusion, our identified hits particularly, CHEMBL-3185642 and CHEMBL-3554847 presented outstanding results and highlighted the stable conformation within the orthosteric site of PPAR-γ to positively modulate the activity.
Collapse
Affiliation(s)
- Sajjad Haider
- H. E. J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Muhammad Shafiq
- H. E. J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Ali Raza Siddiqui
- H. E. J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Madiha Sardar
- H. E. J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Mamona Mushtaq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Sehrish Shafeeq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan
| | - Mohammad Nur-E-Alam
- Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. Box. 2457, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Aftab Ahmad
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, 92618, USA
| | - Zaheer Ul-Haq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, 75270, Pakistan.
| |
Collapse
|
8
|
Wang Z, Feng B, Gao Q, Wang Y, Yang Y, Luo B, Zhang Q, Wang F, Li B. A prediction method of interaction based on Bilinear Attention Networks for designing polyphenol-protein complexes delivery systems. Int J Biol Macromol 2024; 269:131959. [PMID: 38692548 DOI: 10.1016/j.ijbiomac.2024.131959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/20/2024] [Accepted: 04/27/2024] [Indexed: 05/03/2024]
Abstract
Polyphenol-protein complexes delivery systems are gaining attention for their potential health benefits and food industry development. However, creating an ideal delivery system requires extensive wet-lab experimentation. To address this, we collected 525 ligand-protein interaction data pairs and established an interaction prediction model using Bilinear Attention Networks. We utilized 10-fold cross validation to address potential overfitting issues in the model, resulting in showed higher average AUROC (0.8443), AUPRC (0.7872), and F1 (0.8164). The optimal threshold (0.3739) was selected for the model to be used for subsequent analysis. Based on the model prediction results and optimal threshold, by verifying experimental analysis, the interaction of paeonol with the following proteins was obtained, including bovine serum albumin (lgKa = 6.2759), bovine β-lactoglobulin (lgKa = 6.7479), egg ovalbumin (lgKa = 5.1806), zein (lgKa = 6.0122), bovine α-lactalbumin (lgKa = 3.9170), bovine lactoferrin (lgKa = 4.5380), the first four proteins are consistent with the predicted results of the model, with lgKa >5. The established model can accurately and rapidly predict the interaction of polyphenol-protein complexes. This study is the first to combine open ligand-protein interaction experiments with Deep Learning algorithms in the food industry, greatly improving research efficiency and providing a novel perspective for future complex delivery system construction.
Collapse
Affiliation(s)
- Zhipeng Wang
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Baolong Feng
- Center for Education Technology, Northeast Agricultural University, Harbin 150030, PR China
| | - Qizhou Gao
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Yutang Wang
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences, Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, PR China.
| | - Yan Yang
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Bowen Luo
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Qi Zhang
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Fengzhong Wang
- Institute of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences, Key Laboratory of Agro-Products Processing, Ministry of Agriculture, Beijing 100193, PR China.
| | - Bailiang Li
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China; Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
| |
Collapse
|
9
|
Cao J, Xu Z. Providing a Photovoltaic Performance Enhancement Relationship from Binary to Ternary Polymer Solar Cells via Machine Learning. Polymers (Basel) 2024; 16:1496. [PMID: 38891443 PMCID: PMC11174796 DOI: 10.3390/polym16111496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 05/17/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Ternary polymer solar cells (PSCs) are currently the simplest and most efficient way to further improve the device performance in PSCs. To find high-performance organic photovoltaic materials, the established connection between the material structure and device performance before fabrication is of great significance. Herein, firstly, a database of the photovoltaic performance in 874 experimental PSCs reported in the literature is established, and three different fingerprint expressions of a molecular structure are explored as input features; the results show that long fingerprints of 2D atom pairs can contain more effective information and improve the accuracy of the models. Through supervised learning, five machine learning (ML) models were trained to build a mapping of the photovoltaic performance improvement relationship from binary to ternary PSCs. The GBDT model had the best predictive ability and generalization. Eighteen key structural features from a non-fullerene acceptor and the third components that affect the device's PCE were screened based on this model, including a nitrile group with lone-pair electron, a halogen atom, an oxygen atom, etc. Interestingly, the structural features for the enhanced device's PCE were essentially increased by the Jsc or FF. More importantly, the reliability of the ML model was further verified by preparing the highly efficient PSCs. Taking the PM6:BTP-eC9:PY-IT ternary PSC as an example, the PCE prediction (18.03%) by the model was in good agreement with the experimental results (17.78%), the relative prediction error was 1.41%, and the relative error between all experimental results and predicted results was less than 5%. These results indicate that ML is a useful tool for exploring the photovoltaic performance improvement of PSCs and accelerating the design and application with highly efficient non-fullerene materials.
Collapse
Affiliation(s)
- Jingyue Cao
- Key Laboratory of Luminescence and Optical Information, Beijing Jiaotong University, Ministry of Education, Beijing 100044, China;
- Institute of Optoelectronics Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Zheng Xu
- Key Laboratory of Luminescence and Optical Information, Beijing Jiaotong University, Ministry of Education, Beijing 100044, China;
- Institute of Optoelectronics Technology, Beijing Jiaotong University, Beijing 100044, China
| |
Collapse
|
10
|
Ryzhkov FV, Ryzhkova YE, Elinson MN. Python tools for structural tasks in chemistry. Mol Divers 2024:10.1007/s11030-024-10889-7. [PMID: 38744790 DOI: 10.1007/s11030-024-10889-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 04/27/2024] [Indexed: 05/16/2024]
Abstract
In recent decades, the use of computational approaches and artificial intelligence in the scientific environment has become more widespread. In this regard, the popular and versatile programming language Python has attracted considerable attention from scientists in the field of chemistry. It is used to solve a variety of chemical and structural problems, including calculating descriptors, molecular fingerprints, graph construction, and computing chemical reaction networks. Python offers high-quality visualization tools for analyzing chemical spaces and compound libraries. This review is a list of tools for the above tasks, including scripts, libraries, ready-made programs, and web interfaces. Inevitably this manuscript does not claim to be an all-encompassing handbook including all the existing Python-based structural chemistry codes. The review serves as a starting point for scientists wishing to apply automatization or optimization to routine chemistry problems.
Collapse
Affiliation(s)
- Fedor V Ryzhkov
- N. D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences, 47 Leninsky Prospekt, Moscow, 119991, Russia.
| | - Yuliya E Ryzhkova
- N. D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences, 47 Leninsky Prospekt, Moscow, 119991, Russia
| | - Michail N Elinson
- N. D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences, 47 Leninsky Prospekt, Moscow, 119991, Russia
| |
Collapse
|
11
|
Xu J, Ye X, Lv Z, Chen YH, Wang XS. The Role of Base in Reaction Performance of Photochemical Synthesis of Thiazoles: An Integrated Theoretical and Experimental Study. Chemistry 2024; 30:e202304279. [PMID: 38409580 DOI: 10.1002/chem.202304279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 02/28/2024]
Abstract
Artificial intelligence (AI)/machine learning (ML) is emerging as pivotal in synthetic chemistry, offering revolutionary potential in retrosynthetic analysis, reaction conditions and reaction prediction. We have combined chemical descriptors, primarily based on Density Functional Theory (DFT) calculations, with various AI/ML tools such as Multi-Layer Perceptron (MLP) and Random Forest (RF), to predict the synthesis of 2-arylbenzothiazole in photoredox reactions. Significantly, our models underscore the critical role of the molecular structure and physicochemical characteristics of the base, especially the total atomic polarizabilities, in the rate-determining steps involving cyclohexyl and phenethyl moieties of the substrate. Moreover, we validated our findings in articles through experimental studies. It showcases the power of AI/ML and quantum chemistry in shaping the future of organic chemistry.
Collapse
Affiliation(s)
- Jiaxin Xu
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Xiaoyu Ye
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Zongchao Lv
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
- CMC Pharmaceutical Research Center, Wuhan RS Pharmaceutical Co., Ltd., Wuhan, 430073, China
| | - Yi-Hung Chen
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Xiang Simon Wang
- Howard University College of Pharmacy, 2300 Fourth Street NW, Washington, DC 20059, United States
| |
Collapse
|
12
|
Long TZ, Jiang DJ, Shi SH, Deng YC, Wang WX, Cao DS. Enhancing Multi-species Liver Microsomal Stability Prediction through Artificial Intelligence. J Chem Inf Model 2024; 64:3222-3236. [PMID: 38498003 DOI: 10.1021/acs.jcim.4c00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.
Collapse
Affiliation(s)
- Teng-Zhi Long
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - De-Jun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Shao-Hua Shi
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
| | - You-Chao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Xuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR 999077, P. R. China
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
13
|
Zhao X, Xu J, Shui Y, Xu M, Hu J, Liu X, Che K, Wang J, Liu Y. PermuteDDS: a permutable feature fusion network for drug-drug synergy prediction. J Cheminform 2024; 16:41. [PMID: 38622663 PMCID: PMC11017561 DOI: 10.1186/s13321-024-00839-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 04/03/2024] [Indexed: 04/17/2024] Open
Abstract
MOTIVATION Drug combination therapies have shown promise in clinical cancer treatments. However, it is hard to experimentally identify all drug combinations for synergistic interaction even with high-throughput screening due to the vast space of potential combinations. Although a number of computational methods for drug synergy prediction have proven successful in narrowing down this space, fusing drug pairs and cell line features effectively still lacks study, hindering current algorithms from understanding the complex interaction between drugs and cell lines. RESULTS In this paper, we proposed a Permutable feature fusion network for Drug-Drug Synergy prediction, named PermuteDDS. PermuteDDS takes multiple representations of drugs and cell lines as input and employs a permutable fusion mechanism to combine drug and cell line features. In experiments, PermuteDDS exhibits state-of-the-art performance on two benchmark data sets. Additionally, the results on independent test set grouped by different tissues reveal that PermuteDDS has good generalization performance. We believed that PermuteDDS is an effective and valuable tool for identifying synergistic drug combinations. It is publicly available at https://github.com/littlewei-lazy/PermuteDDS . SCIENTIFIC CONTRIBUTION First, this paper proposes a permutable feature fusion network for predicting drug synergy termed PermuteDDS, which extract diverse information from multiple drug representations and cell line representations. Second, the permutable fusion mechanism combine the drug and cell line features by integrating information of different channels, enabling the utilization of complex relationships between drugs and cell lines. Third, comparative and ablation experiments provide evidence of the efficacy of PermuteDDS in predicting drug-drug synergy.
Collapse
Affiliation(s)
- Xinwei Zhao
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Junqing Xu
- The Second Clinical Medical School, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Youyuan Shui
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Mengdie Xu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Jie Hu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
- Institute of Medical Informatics and Management, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 210029, Jiangsu, China
| | - Xiaoyan Liu
- Faculty of Computing, Harbin Institute of Technology, No. 92 West Da Zhi St, Harbin, 150001, Heilongjiang, China
| | - Kai Che
- Xi'an Aeronautics Computing Technique Research Institute, AVIC, No. 156, TaiBai Nroth Road, Xi'an, 710068, Shanxi, China
- Aviation Key Laboratory of Science and Technology on Airborne and Missleborne Computer, Xi'an, 710065, Shanxi, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
- Institute of Medical Informatics and Management, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 210029, Jiangsu, China.
| | - Yun Liu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
- Institute of Medical Informatics and Management, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 210029, Jiangsu, China.
- Department of Information, the First Affiliated Hospital, Nanjing Medical University, No. 300 Guang Zhou Road, Nanjing, 210029, Jiangsu, China.
| |
Collapse
|
14
|
Branning JM, Faughnan KA, Tomson AA, Bell GJ, Isbell SM, DeGroot A, Jameson L, Kilroy K, Smith M, Smith R, Mottel L, Branning EG, Worrall Z, Anderson F, Panditaradyula A, Yang W, Abdelmalek J, Brake J, Cash KJ. Multifunction fluorescence open source in vivo/in vitro imaging system (openIVIS). PLoS One 2024; 19:e0299875. [PMID: 38498588 PMCID: PMC10947658 DOI: 10.1371/journal.pone.0299875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/18/2024] [Indexed: 03/20/2024] Open
Abstract
The widespread availability and diversity of open-source microcontrollers paired with off-the-shelf electronics and 3D printed technology has led to the creation of a wide range of low-cost scientific instruments, including microscopes, spectrometers, sensors, data loggers, and other tools that can be used for research, education, and experimentation. These devices can be used to explore a wide range of scientific topics, from biology and chemistry to physics and engineering. In this study, we designed and built a multifunction fluorescent open source in vivo/in vitro imaging system (openIVIS) system that integrates a Raspberry Pi with commercial cameras and LEDs with 3D printed structures combined with an acrylic housing. Our openIVIS provides three excitation wavelengths of 460 nm, 520 nm, and 630 nm integrated with Python control software to enable fluorescent measurements across the full visible light spectrum. To demonstrate the potential applications of our system, we tested its performance against a diverse set of experiments including laboratory assays (measuring fluorescent dyes, using optical nanosensors, and DNA gel electrophoresis) to potentially fieldable applications (plant and mineral imaging). We also tested the potential use for a high school biology environment by imaging small animals and tracking their development over the course of ten days. Our system demonstrated its ability to measure a wide dynamic range fluorescent response from millimolar to picomolar concentrations in the same sample while measuring responses across visible wavelengths. These results demonstrate the power and flexibility of open-source hardware and software and how it can be integrated with customizable manufacturing to create low-cost scientific instruments with a wide range of applications. Our study provides a promising model for the development of low-cost instruments that can be used in both research and education.
Collapse
Affiliation(s)
- John M. Branning
- Quantitative Biosciences and Engineering, Colorado School of Mines, Golden, Colorado, United States of America
- The MITRE Corporation, Bedford, Massachusetts, United States of America
| | - Kealy A. Faughnan
- Chemical and Biological Engineering Department, Colorado School of Mines, Golden, Colorado, United States of America
| | - Austin A. Tomson
- Mechanical Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Grant J. Bell
- Quantitative Biosciences and Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Sydney M. Isbell
- Chemical and Biological Engineering Department, Colorado School of Mines, Golden, Colorado, United States of America
| | - Allen DeGroot
- Electrical Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Lydia Jameson
- Electrical Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Kramer Kilroy
- Mechanical Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Michael Smith
- Mechanical Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Robert Smith
- Electrical Engineering, Colorado School of Mines, Golden, Colorado, United States of America
| | - Landon Mottel
- Arvada West High School, Arvada, Colorado, United States of America
| | - Elizabeth G. Branning
- Colorado Early Colleges Castle Rock, Castle Rock, Colorado, United States of America
| | - Zoe Worrall
- Department of Engineering, Harvey Mudd College, Claremont, California, United States of America
| | - Frances Anderson
- Department of Engineering, Harvey Mudd College, Claremont, California, United States of America
| | - Ashrit Panditaradyula
- Department of Engineering, Harvey Mudd College, Claremont, California, United States of America
| | - William Yang
- Department of Engineering, Harvey Mudd College, Claremont, California, United States of America
| | - Joseph Abdelmalek
- Department of Engineering, Harvey Mudd College, Claremont, California, United States of America
| | - Joshua Brake
- Department of Engineering, Harvey Mudd College, Claremont, California, United States of America
| | - Kevin J. Cash
- Quantitative Biosciences and Engineering, Colorado School of Mines, Golden, Colorado, United States of America
- Chemical and Biological Engineering Department, Colorado School of Mines, Golden, Colorado, United States of America
| |
Collapse
|
15
|
Gao YY, Zhao W, Huang YQ, Kumar V, Zhang X, Hao GF. In silico environmental risk assessment improves efficiency for pesticide safety management. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 908:167878. [PMID: 37858821 DOI: 10.1016/j.scitotenv.2023.167878] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/09/2023] [Accepted: 10/14/2023] [Indexed: 10/21/2023]
Abstract
Pesticides are indispensable to maintain crop quality and food production worldwide, but their use also poses environmental risks. Pesticide risk assessment involves a series of complex, expensive and time-consuming toxicity tests. To improve the efficiency and accuracy for assessing the environmental impact of pesticides, numerous computational tools have been developed. However, there is a notable deficiency in critical analysis or a systematic summary of environmental risk assessment tools and their applicable contexts. Here, many of the current approaches and tools for assessing environmental risks posed by pesticides are reviewed, and the question of whether these tools are fit for use on complex multicomponent scenarios is discussed. We analyze the adaptations of these tools to aquatic and terrestrial ecosystems, followed by the provision of resources for predicting pesticide concentrations in environmental medias, including air, soil and water. The successful application of computational tools for risk assessment and interpretation of predicted results will also be discussed. This assessment serves as a valuable resource, enabling scientists to utilize suitable models to enhance the robustness of pesticides risk assessments.
Collapse
Affiliation(s)
- Yang-Yang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Wei Zhao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Yuan-Qin Huang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Vinit Kumar
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Xiao Zhang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China; National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
16
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
17
|
Kumari P, Besold T, Spranger M. Perceptual metrics for odorants: Learning from non-expert similarity feedback using machine learning. PLoS One 2023; 18:e0291767. [PMID: 37939067 PMCID: PMC10631653 DOI: 10.1371/journal.pone.0291767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 09/05/2023] [Indexed: 11/10/2023] Open
Abstract
Defining perceptual similarity metrics for odorant comparisons is crucial to understanding the mechanism of olfactory perception. Current methods in olfaction rely on molecular physicochemical features or discrete verbal descriptors (floral, burnt, etc.) to approximate perceptual (dis)similarity between odorants. However, structural or verbal descriptors alone are limited in modeling complex nuances of odor perception. While structural features inadequately characterize odor perception, language-based discrete descriptors lack the granularity needed to model a continuous perception space. We introduce data-driven approaches to perceptual metrics learning (PMeL) based on two key insights: a) by combining physicochemical features with the user's perceptual feedback, we can leverage both structural and perceptual attributes of odors to define dissimilarity, and b) instead of discrete labels, user's perceptual feedback can be gathered as relative similarity comparisons, such as "Does molecule-A smell more like molecule-B, or molecule-C?" These triplet comparisons are easier even for non-experts users and offer a more effective representation of the continuous perception space. Experimental results on several defined tasks show the effectiveness of our approach in evaluating perceptual dissimilarity between odorants. Finally, we investigate how closely our model, trained on non-expert feedback, aligns with the expert's similarity judgments. Our effort aims to reduce reliance on expert annotations.
Collapse
|
18
|
Suresh M, Naicker K, Solanki J, Ezirim SA, Turcio R, Tochukwu IG, Lakhdari K, Attah EI. Ligand-based pharmacophore modelling, virtual screening and docking studies to identify potential compounds against FtsZ of Mycobacterium tuberculosis. Indian J Tuberc 2023; 70:430-444. [PMID: 37968049 DOI: 10.1016/j.ijtb.2023.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 03/15/2023] [Indexed: 11/17/2023]
Abstract
BACKGROUND AND INTRODUCTION Tuberculosis (TB) is caused by Mycobacterium tuberculosis (M.tb) which is the most common cause of death from bacterial illness. Millions of victims of TB infections have been recorded including 20,800 deaths amongst HIV positive individuals. Hence, there is a rising need for new and active compounds against M. tb protein targets especially as there is a persistent resistance to the current drug treatment regime. AIM This study identifies new potential compounds against the M. tb target protein ftsZ via pharmacophore modelling, QSAR analysis and docking studies. METHOD Inhibitors with known PIC50 were used as a training set and the pharmacophore features (1 aromatic center, 2 hydrophobic, 2 hydrogen bond acceptors and 1 hydrogen bond donor) were validated against four test set compounds. The identified hits were subjected to rigorous ADMET properties and docked using PyRx. DS visualizer was used in binding interactions study. Stability was measured based on the total number of interactions and preference given to the number of hydrogen bond interactions. RESULTS Based on the number of interactions, hydrogen bonds, extensive virtual screening and ADMET filtration, 40 compounds have been identified as potential inhibitors of ftsZ with only 3 considered to be the best leads. SIGNIFICANCE OF RESEARCH The identified compounds have potential of being drug candidate against Mycobacterium tuberculosis and may possess a novel mechanistic route in inhibiting the resistant strains.
Collapse
Affiliation(s)
- Madhumitha Suresh
- Alagappa College of Technology, Centre for Biotechnology, Anna University, Chennai, TamilNadu, India
| | - Kerishnee Naicker
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Westville, South Africa
| | - Jaykishan Solanki
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | | | - Rita Turcio
- Pharmaceutical Biotechnology University of Naples Federico II, Italy
| | | | | | - Emmanuel Ifeanyi Attah
- Department of Pharmaceutical and Medicinal Chemistry, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, Nigeria.
| |
Collapse
|
19
|
Rojas C, Ballabio D, Consonni V, Suárez-Estrella D, Todeschini R. Classification-based machine learning approaches to predict the taste of molecules: A review. Food Res Int 2023; 171:113036. [PMID: 37330849 DOI: 10.1016/j.foodres.2023.113036] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/02/2023] [Accepted: 05/22/2023] [Indexed: 06/19/2023]
Abstract
The capacity to discriminate safe from dangerous compounds has played an important role in the evolution of species, including human beings. Highly evolved senses such as taste receptors allow humans to navigate and survive in the environment through information that arrives to the brain through electrical pulses. Specifically, taste receptors provide multiple bits of information about the substances that are introduced orally. These substances could be pleasant or not according to the taste responses that they trigger. Tastes have been classified into basic (sweet, bitter, umami, sour and salty) or non-basic (astringent, chilling, cooling, heating, pungent), while some compounds are considered as multitastes, taste modifiers or tasteless. Classification-based machine learning approaches are useful tools to develop predictive mathematical relationships in such a way as to predict the taste class of new molecules based on their chemical structure. This work reviews the history of multicriteria quantitative structure-taste relationship modelling, starting from the first ligand-based (LB) classifier proposed in 1980 by Lemont B. Kier and concluding with the most recent studies published in 2022.
Collapse
Affiliation(s)
- Cristian Rojas
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Diego Suárez-Estrella
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| |
Collapse
|
20
|
Panwar P, Yang Q, Martini A. PyL3dMD: Python LAMMPS 3D molecular descriptors package. J Cheminform 2023; 15:69. [PMID: 37507792 PMCID: PMC10385924 DOI: 10.1186/s13321-023-00737-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023] Open
Abstract
Molecular descriptors characterize the biological, physical, and chemical properties of molecules and have long been used for understanding molecular interactions and facilitating materials design. Some of the most robust descriptors are derived from geometrical representations of molecules, called 3-dimensional (3D) descriptors. When calculated from molecular dynamics (MD) simulation trajectories, 3D descriptors can also capture the effects of operating conditions such as temperature or pressure. However, extracting 3D descriptors from MD trajectories is non-trivial, which hinders their wide use by researchers developing advanced quantitative-structure-property-relationship models using machine learning. Here, we describe a suite of open-source Python-based post-processing routines, called PyL3dMD, for calculating 3D descriptors from MD simulations. PyL3dMD is compatible with the popular simulation package LAMMPS and enables users to compute more than 2000 3D molecular descriptors from atomic trajectories generated by MD simulations. PyL3dMD is freely available via GitHub and can be easily installed and used as a highly flexible Python package on all major platforms (Windows, Linux, and macOS). A performance benchmark study used descriptors calculated by PyL3dMD to develop a neural network and the results showed that PyL3dMD is fast and efficient in calculating descriptors for large and complex molecular systems with long simulation durations. PyL3dMD facilitates the calculation of 3D molecular descriptors using MD simulations, making it a valuable tool for cheminformatics studies.
Collapse
Affiliation(s)
- Pawan Panwar
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, CA, 95343, USA.
| | - Quanpeng Yang
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, CA, 95343, USA
| | - Ashlie Martini
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, CA, 95343, USA.
| |
Collapse
|
21
|
Matusevičiūtė R, Ignatavičiūtė E, Mickus R, Bordel S, Skeberdis VA, Raškevičius V. Evaluation of Cx43 Gap Junction Inhibitors Using a Quantitative Structure-Activity Relationship Model. Biomedicines 2023; 11:1972. [PMID: 37509611 PMCID: PMC10377234 DOI: 10.3390/biomedicines11071972] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/07/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
Gap junctions (GJs) made of connexin-43 (Cx43) are necessary for the conduction of electrical impulses in the heart. Modulation of Cx43 GJ activity may be beneficial in the treatment of cardiac arrhythmias and other dysfunctions. The search for novel GJ-modulating agents using molecular docking allows for the accurate prediction of binding affinities of ligands, which, unfortunately, often poorly correlate with their potencies. The objective of this study was to demonstrate that a Quantitative Structure-Activity Relationship (QSAR) model could be used for more precise identification of potent Cx43 GJ inhibitors. Using molecular docking, QSAR, and 3D-QSAR, we evaluated 16 known Cx43 GJ inhibitors, suggested the monocyclic monoterpene d-limonene as a putative Cx43 inhibitor, and tested it experimentally in HeLa cells expressing exogenous Cx43. The predicted concentrations required to produce 50% of the maximal effect (IC50) for each of these compounds were compared with those determined experimentally (pIC50 and eIC50, respectively). The pIC50ies of d-limonene and other Cx43 GJ inhibitors examined by our QSAR and 3D-QSAR models showed a good correlation with their eIC50ies (R = 0.88 and 0.90, respectively) in contrast to pIC50ies obtained from molecular docking (R = 0.78). However, molecular docking suggests that inhibitor potency may depend on their docking conformation on Cx43. Searching for new potent, selective, and specific inhibitors of GJ channels, we propose to perform the primary screening of new putative compounds using the QSAR model, followed by the validation of the most suitable candidates by patch-clamp techniques.
Collapse
Affiliation(s)
- Ramona Matusevičiūtė
- Faculty of Medicine, Lithuanian University of Health Sciences, 03101 Kaunas, Lithuania; (R.M.); (E.I.)
| | - Eglė Ignatavičiūtė
- Faculty of Medicine, Lithuanian University of Health Sciences, 03101 Kaunas, Lithuania; (R.M.); (E.I.)
| | - Rokas Mickus
- Institute of Cardiology, Lithuanian University of Health Sciences, 50162 Kaunas, Lithuania; (R.M.); (S.B.); (V.A.S.)
| | - Sergio Bordel
- Institute of Cardiology, Lithuanian University of Health Sciences, 50162 Kaunas, Lithuania; (R.M.); (S.B.); (V.A.S.)
- Institute of Sustainable Processes, University of Valladolid, 47011 Valladolid, Spain
| | - Vytenis Arvydas Skeberdis
- Institute of Cardiology, Lithuanian University of Health Sciences, 50162 Kaunas, Lithuania; (R.M.); (S.B.); (V.A.S.)
| | - Vytautas Raškevičius
- Institute of Cardiology, Lithuanian University of Health Sciences, 50162 Kaunas, Lithuania; (R.M.); (S.B.); (V.A.S.)
| |
Collapse
|
22
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
23
|
Hajiabolhassan H, Taheri Z, Hojatnia A, Yeganeh YT. FunQG: Molecular Representation Learning via Quotient Graphs. J Chem Inf Model 2023. [PMID: 37186874 DOI: 10.1021/acs.jcim.3c00445] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
To accurately predict molecular properties, it is important to learn expressive molecular representations. Graph neural networks (GNNs) have made significant advances in this area, but they often face limitations like neighbors-explosion, under-reaching, oversmoothing, and oversquashing. Additionally, GNNs tend to have high computational costs due to their large number of parameters. These limitations emerge or increase when dealing with larger graphs or deeper GNN models. One potential solution is to simplify the molecular graph into a smaller, richer, and more informative one that is easier to train GNNs. Our proposed molecular graph coarsening framework called FunQG, uses Functional groups as building blocks to determine a molecule's properties, based on a graph-theoretic concept called Quotient Graph. We show through experiments that the resulting informative graphs are much smaller than the original molecular graphs and are thus more suitable for training GNNs. We apply FunQG to popular molecular property prediction benchmarks and compare the performance of popular baseline GNNs on the resulting data sets to that of state-of-the-art baselines on the original data sets. Our experiments demonstrate that FunQG yields notable results on various data sets while dramatically reducing the number of parameters and computational costs. By utilizing functional groups, we can achieve an interpretable framework that indicates their significant role in determining the properties of molecular quotient graphs. Consequently, FunQG is a straightforward, computationally efficient, and generalizable solution for addressing the molecular representation learning problem.
Collapse
Affiliation(s)
- Hossein Hajiabolhassan
- Department of Mathematics and Information Technology, Chair of Information Technology, Montanuniversität Leoben, Franz-Josef-Strasse 18, A-8700 Leoben, Austria
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| | - Zahra Taheri
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| | - Ali Hojatnia
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| | - Yavar Taheri Yeganeh
- Machine Learning and Graph Mining Lab, Department of Applied Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, 19839-69411 Tehran, Iran
| |
Collapse
|
24
|
Li X, Wang H, Jiang M, Ding M, Xu X, Xu B, Zou Y, Yu Y, Yang W. Collision Cross Section Prediction Based on Machine Learning. Molecules 2023; 28:molecules28104050. [PMID: 37241791 DOI: 10.3390/molecules28104050] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/10/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Ion mobility-mass spectrometry (IM-MS) is a powerful separation technique providing an additional dimension of separation to support the enhanced separation and characterization of complex components from the tissue metabolome and medicinal herbs. The integration of machine learning (ML) with IM-MS can overcome the barrier to the lack of reference standards, promoting the creation of a large number of proprietary collision cross section (CCS) databases, which help to achieve the rapid, comprehensive, and accurate characterization of the contained chemical components. In this review, advances in CCS prediction using ML in the past 2 decades are summarized. The advantages of ion mobility-mass spectrometers and the commercially available ion mobility technologies with different principles (e.g., time dispersive, confinement and selective release, and space dispersive) are introduced and compared. The general procedures involved in CCS prediction based on ML (acquisition and optimization of the independent and dependent variables, model construction and evaluation, etc.) are highlighted. In addition, quantum chemistry, molecular dynamics, and CCS theoretical calculations are also described. Finally, the applications of CCS prediction in metabolomics, natural products, foods, and the other research fields are reflected.
Collapse
Affiliation(s)
- Xiaohang Li
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Hongda Wang
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Meiting Jiang
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Mengxiang Ding
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Xiaoyan Xu
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Bei Xu
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Yadan Zou
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Yuetong Yu
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| | - Wenzhi Yang
- State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Tianjin 301617, China
| |
Collapse
|
25
|
Yang SQ, Zhang LX, Ge YJ, Zhang JW, Hu JX, Shen CY, Lu AP, Hou TJ, Cao DS. In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences. J Cheminform 2023; 15:48. [PMID: 37088813 PMCID: PMC10123967 DOI: 10.1186/s13321-023-00720-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 04/08/2023] [Indexed: 04/25/2023] Open
Abstract
Identification and validation of bioactive small-molecule targets is a significant challenge in drug discovery. In recent years, various in-silico approaches have been proposed to expedite time- and resource-consuming experiments for target detection. Herein, we developed several chemogenomic models for target prediction based on multi-scale information of chemical structures and protein sequences. By combining the information of a compound with multiple protein targets together and putting these compound-target pairs into a well-established model, the scores to indicate whether there are interactions between compounds and targets can be derived, and thus a target prediction task can be completed by sorting the outputted scores. To improve the prediction performance, we constructed several chemogenomic models using multi-scale information of chemical structures and protein sequences, and the ensemble model with the best performance was used as our final model. The model was validated by various strategies and external datasets and the promising target prediction capability of the model, i.e., the fraction of known targets identified in the top-k (1 to 10) list of the potential target candidates suggested by the model, was confirmed. Compared with multiple state-of-art target prediction methods, our model showed equivalent or better predictive ability in terms of the top-k predictions. It is expected that our method can be utilized as a powerful computational tool to narrow down the potential targets for experimental testing.
Collapse
Affiliation(s)
- Su-Qing Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Liu-Xia Zhang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, 410007, Hunan, People's Republic of China
| | - You-Jin Ge
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Jin-Wei Zhang
- Departments of Biomedical Engineering and Pathology, School of Basic Medical Science, Central South University, Changsha, 410013, Hunan, People's Republic of China
| | - Jian-Xin Hu
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Cheng-Ying Shen
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
26
|
Hong E, Jeon J, Kim HU. Recent development of machine learning models for the prediction of drug-drug interactions. KOREAN J CHEM ENG 2023; 40:276-285. [PMID: 36748027 PMCID: PMC9894510 DOI: 10.1007/s11814-023-1377-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 12/09/2022] [Accepted: 12/16/2022] [Indexed: 02/05/2023]
Abstract
Polypharmacy, the co-administration of multiple drugs, has become an area of concern as the elderly population grows and an unexpected infection, such as COVID-19 pandemic, keeps emerging. However, it is very costly and time-consuming to experimentally examine the pharmacological effects of polypharmacy. To address this challenge, machine learning models that predict drug-drug interactions (DDIs) have actively been developed in recent years. In particular, the growing volume of drug datasets and the advances in machine learning have facilitated the model development. In this regard, this review discusses the DDI-predicting machine learning models that have been developed since 2018. Our discussion focuses on dataset sources used to develop the models, featurization approaches of molecular structures and biological information, and types of DDI prediction outcomes from the models. Finally, we make suggestions for research opportunities in this field.
Collapse
Affiliation(s)
- Eujin Hong
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141 Korea
| |
Collapse
|
27
|
Al-Sha'er MA, Basheer HA, Taha MO. Discovery of new PKN2 inhibitory chemotypes via QSAR-guided selection of docking-based pharmacophores. Mol Divers 2023; 27:443-462. [PMID: 35507210 DOI: 10.1007/s11030-022-10434-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 04/05/2022] [Indexed: 12/13/2022]
Abstract
Serine/threonine-protein kinase N2 (PKN2) plays an important role in cell cycle progression, cell migration, cell adhesion and transcription activation signaling processes. In cancer, however, it plays important roles in tumor cell migration, invasion and apoptosis. PKN2 inhibitors have been shown to be promising in treating cancer. This prompted us to model this interesting target using our QSAR-guided selection of docking-based pharmacophores approach where numerous pharmacophores are extracted from docked ligand poses and allowed to compete within the context of QSAR. The optimal pharmacophore was sterically-refined, validated by receiver operating characteristic (ROC) curve analysis and used as virtual search query to screen the National Cancer Institute (NCI) database for new promising anti-PKN2 leads of novel chemotypes. Three low micromolar hits were identified with IC50 values ranging between 9.9 and 18.6 µM. Pharmacological assays showed promising cytotoxic properties for active hits in MTT and wound healing assays against MCF-7 and PANC-1 cancer cells.
Collapse
Affiliation(s)
- Mahmoud A Al-Sha'er
- Faculty of Pharmacy, Zarqa University, P.O. Box 132222, Zarqa, 13132, Jordan.
| | - Haneen A Basheer
- Faculty of Pharmacy, Zarqa University, P.O. Box 132222, Zarqa, 13132, Jordan
| | - Mutasem O Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan.
| |
Collapse
|
28
|
Sarkar C, Das B, Rawat VS, Wahlang JB, Nongpiur A, Tiewsoh I, Lyngdoh NM, Das D, Bidarolli M, Sony HT. Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development. Int J Mol Sci 2023; 24:ijms24032026. [PMID: 36768346 PMCID: PMC9916967 DOI: 10.3390/ijms24032026] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/27/2022] [Accepted: 12/28/2022] [Indexed: 01/22/2023] Open
Abstract
The discovery and advances of medicines may be considered as the ultimate relevant translational science effort that adds to human invulnerability and happiness. But advancing a fresh medication is a quite convoluted, costly, and protracted operation, normally costing USD ~2.6 billion and consuming a mean time span of 12 years. Methods to cut back expenditure and hasten new drug discovery have prompted an arduous and compelling brainstorming exercise in the pharmaceutical industry. The engagement of Artificial Intelligence (AI), including the deep-learning (DL) component in particular, has been facilitated by the employment of classified big data, in concert with strikingly reinforced computing prowess and cloud storage, across all fields. AI has energized computer-facilitated drug discovery. An unrestricted espousing of machine learning (ML), especially DL, in many scientific specialties, and the technological refinements in computing hardware and software, in concert with various aspects of the problem, sustain this progress. ML algorithms have been extensively engaged for computer-facilitated drug discovery. DL methods, such as artificial neural networks (ANNs) comprising multiple buried processing layers, have of late seen a resurgence due to their capability to power automatic attribute elicitations from the input data, coupled with their ability to obtain nonlinear input-output pertinencies. Such features of DL methods augment classical ML techniques which bank on human-contrived molecular descriptors. A major part of the early reluctance concerning utility of AI in pharmaceutical discovery has begun to melt, thereby advancing medicinal chemistry. AI, along with modern experimental technical knowledge, is anticipated to invigorate the quest for new and improved pharmaceuticals in an expeditious, economical, and increasingly compelling manner. DL-facilitated methods have just initiated kickstarting for some integral issues in drug discovery. Many technological advances, such as "message-passing paradigms", "spatial-symmetry-preserving networks", "hybrid de novo design", and other ingenious ML exemplars, will definitely come to be pervasively widespread and help dissect many of the biggest, and most intriguing inquiries. Open data allocation and model augmentation will exert a decisive hold during the progress of drug discovery employing AI. This review will address the impending utilizations of AI to refine and bolster the drug discovery operation.
Collapse
Affiliation(s)
- Chayna Sarkar
- Department of Pharmacology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Biswadeep Das
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
- Correspondence: ; Tel./Fax: +91-135-708-856-0009
| | - Vikram Singh Rawat
- Department of Psychiatry, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
| | - Julie Birdie Wahlang
- Department of Pharmacology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Arvind Nongpiur
- Department of Psychiatry, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Iadarilang Tiewsoh
- Department of Medicine, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Nari M. Lyngdoh
- Department of Anesthesiology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Debasmita Das
- Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore Campus, Tiruvalam Road, Katpadi, Vellore 632014, Tamil Nadu, India
| | - Manjunath Bidarolli
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
| | - Hannah Theresa Sony
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
| |
Collapse
|
29
|
Li TH, Wang CC, Zhang L, Chen X. SNRMPACDC: computational model focused on Siamese network and random matrix projection for anticancer synergistic drug combination prediction. Brief Bioinform 2023; 24:6843566. [PMID: 36418927 DOI: 10.1093/bib/bbac503] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/22/2022] [Accepted: 10/24/2022] [Indexed: 11/25/2022] Open
Abstract
Synergistic drug combinations can improve the therapeutic effect and reduce the drug dosage to avoid toxicity. In previous years, an in vitro approach was utilized to screen synergistic drug combinations. However, the in vitro method is time-consuming and expensive. With the rapid growth of high-throughput data, computational methods are becoming efficient tools to predict potential synergistic drug combinations. Considering the limitations of the previous computational methods, we developed a new model named Siamese Network and Random Matrix Projection for AntiCancer Drug Combination prediction (SNRMPACDC). Firstly, the Siamese convolutional network and random matrix projection were used to process the features of the two drugs into drug combination features. Then, the features of the cancer cell line were processed through the convolutional network. Finally, the processed features were integrated and input into the multi-layer perceptron network to get the predicted score. Compared with the traditional method of splicing drug features into drug combination features, SNRMPACDC improved the interpretability of drug combination features to a certain extent. In addition, the introduction of convolutional networks can better extract the potential information in the features. SNRMPACDC achieved the root mean-squared error of 15.01 and the Pearson correlation coefficient of 0.75 in 5-fold cross-validation of regression prediction for response data. In addition, SNRMPACDC achieved the AUC of 0.91 ± 0.03 and the AUPR of 0.62 ± 0.05 in 5-fold cross-validation of classification prediction of synergistic or not. These results are almost better than all the previous models. SNRMPACDC would be an effective approach to infer potential anticancer synergistic drug combinations.
Collapse
Affiliation(s)
- Tian-Hao Li
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
30
|
Long TZ, Shi SH, Liu S, Lu AP, Liu ZQ, Li M, Hou TJ, Cao DS. Structural Analysis and Prediction of Hematotoxicity Using Deep Learning Approaches. J Chem Inf Model 2023; 63:111-125. [PMID: 36472475 DOI: 10.1021/acs.jcim.2c01088] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Hematotoxicity has been becoming a serious but overlooked toxicity in drug discovery. However, only a few in silico models have been reported for the prediction of hematotoxicity. In this study, we constructed a high-quality dataset comprising 759 hematotoxic compounds and 1623 nonhematotoxic compounds and then established a series of classification models based on a combination of seven machine learning (ML) algorithms and nine molecular representations. The results based on two data partitioning strategies and applicability domain (AD) analysis illustrate that the best prediction model based on Attentive FP yielded a balanced accuracy (BA) of 72.6%, an area under the receiver operating characteristic curve (AUC) value of 76.8% for the validation set, and a BA of 69.2%, an AUC of 75.9% for the test set. In addition, compared with existing filtering rules and models, our model achieved the highest BA value of 67.5% for the external validation set. Additionally, the shapley additive explanation (SHAP) and atom heatmap approaches were utilized to discover the important features and structural fragments related to hematotoxicity, which could offer helpful tips to detect undesired positive substances. Furthermore, matched molecular pair analysis (MMPA) and representative substructure derivation technique were employed to further characterize and investigate the transformation principles and distinctive structural features of hematotoxic chemicals. We believe that the novel graph-based deep learning algorithms and insightful interpretation presented in this study can be used as a trustworthy and effective tool to assess hematotoxicity in the development of new drugs.
Collapse
Affiliation(s)
- Teng-Zhi Long
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Shao-Hua Shi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.,Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, 0000, P. R. China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Ai-Ping Lu
- Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, 0000, P. R. China
| | - Zhao-Qian Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, P. R. China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.,Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, 0000, P. R. China.,Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| |
Collapse
|
31
|
Kumar S, Manoharan A, J J, Abdelgawad MA, Mahdi WA, Alshehri S, Ghoneim MM, Pappachen LK, Zachariah SM, Aneesh TP, Mathew B. Exploiting butyrylcholinesterase inhibitors through a combined 3-D pharmacophore modeling, QSAR, molecular docking, and molecular dynamics investigation †. RSC Adv 2023; 13:9513-9529. [PMID: 36968055 PMCID: PMC10035067 DOI: 10.1039/d3ra00526g] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 03/14/2023] [Indexed: 03/25/2023] Open
Abstract
Alzheimer's disease (AD), a neurodegenerative condition associated with ageing, can occur. AD gradually impairs memory and cognitive function, which leads to abnormal behavior, incapacity, and reliance. By 2050, there will likely be 100 million cases of AD in the world's population. Acetylcholinesterase (AChE) and butyrylcholinesterase (BuChE) inhibition are significant components of AD treatment. This work developed models using the genetic method multiple linear regression, atom-based, field-based, and 3-D pharmacophore modelling. Due to internal and external validation, all of the models have solid statistical (R2 > 0.81 and Q2 > 0.77) underpinnings. From a pre-plated CNS library (6055), we discovered a hit compound using virtual screening on a QSAR model. Through molecular docking, additional hit compounds were investigated (XP mode). Finally, a molecular dynamics simulation revealed that the Molecule5093-4BDS complex was stable (100 ns). Finally, the expected ADME properties for the hit compounds (Molecule5093, Molecule1076, Molecule4412, Molecule1053, and Molecule3344) were found. According to the results of our investigation and the prospective hit compounds, BuChE inhibitors may be used as a treatment for AD. Alzheimer's disease (AD), a neurodegenerative condition associated with ageing, can occur.![]()
Collapse
Affiliation(s)
- Sunil Kumar
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| | - Amritha Manoharan
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| | - Jayalakshmi J
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| | - Mohamed A. Abdelgawad
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf UniversitySakaka72341Saudi Arabia
- Department of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Beni-Suef UniversityBeni-SuefEgypt
| | - Wael A. Mahdi
- Department of Pharmaceutics, College of Pharmacy, King Saud UniversityRiyadh11451Saudi Arabia
| | - Sultan Alshehri
- Department of Pharmaceutics, College of Pharmacy, King Saud UniversityRiyadh11451Saudi Arabia
| | - Mohammed M. Ghoneim
- Department of Pharmacy Practice, College of Pharmacy, AlMaarefa UniversityAd Diriyah13713Saudi Arabia
- Pharmacognosy and Medicinal Plants Department, Faculty of Pharmacy, Al-Azhar UniversityCairo11884Egypt
| | - Leena K. Pappachen
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| | - Subin Mary Zachariah
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| | - T. P. Aneesh
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| | - Bijo Mathew
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences CampusKochi682 041India
| |
Collapse
|
32
|
Liu Y, Zhang R, Li T, Jiang J, Ma J, Wang P. MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction. J Mol Graph Model 2023; 118:108344. [PMID: 36242862 DOI: 10.1016/j.jmgm.2022.108344] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/28/2022]
Abstract
Molecular property prediction is a significant task in drug discovery. Most deep learning-based computational methods either develop unique chemical representation or combine complex model. However, researchers are less concerned with the possible advantages of enormous quantities of unlabeled molecular data. Since the obvious limited amount of labeled data available, this task becomes more difficult. In some senses, SMILES of the drug molecule may be regarded of as a language for chemistry, taking inspiration from natural language processing research and current advances in pretrained models. In this paper, we incorporated Rotary Position Embedding(RoPE) efficiently encode the position information of SMILES sequences, ultimately enhancing the capability of the BERT pretrained model to extract potential molecular substructure information for molecular property prediction. We proposed the MolRoPE-BERT framework, an new end-to-end deep learning framework that integrates an efficient position coding approach for capturing sequence position information with a pretrained BERT model for molecular property prediction. To generate useful molecular substructure embeddings, we first exclusively train the MolRoPE-BERT on four million unlabeled drug SMILES(i.e., ZINC 15 and ChEMBL 27). Then, we conduct a series of experiments to evaluate the performance of our proposed MolRoPE-BERT on four well-studied datasets. Compared with conventional and state-of-the-art baselines, our experiment demonstrated comparable or superior performance.
Collapse
Affiliation(s)
- Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Jing Jiang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Ping Wang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| |
Collapse
|
33
|
Liu MQ, Wang T, Wang QL, Zhou J, Wang BR, Zhang B, Wang KL, Zhu H, Zhang YH. Structure-guided discovery of food-derived GABA-T inhibitors as hunters for anti-anxiety compounds. Food Funct 2022; 13:12674-12685. [PMID: 36382616 DOI: 10.1039/d2fo01315k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
With the acceleration of the pace of life, people may face all kinds of pressure, and anxiety has become a common mental issue that is seriously affecting human life. Safe and effective food-derived compounds may be used as anti-anxiety compounds. In this study, anti-anxiety compounds were collected and curated for database construction. Quantitative structure-activity relationship (QSAR) models were developed using a combination of various machine-learning approaches and chemical descriptors to predict natural compounds in food with anti-anxiety effects. High-throughput molecular docking was used to screen out compounds that could function as anti-anxiety molecules by inhibiting γ-aminobutyrate transaminase (GABA-T) enzyme, and 7 compounds were screened for in vitro activity verification. Pharmacokinetic analysis revealed three compounds (quercetin, lithocholic acid, and ferulic acid) that met Lipinski's Rule of Five and inhibited the GABA-T enzyme to alleviate anxiety in vitro. The established QSAR model combined with molecular docking and molecular dynamics was proved by the synthesis and discovery of novel food-derived anti-anxiety compounds.
Collapse
Affiliation(s)
- Meng-Qi Liu
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Tong Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, USA
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, USA
| | - Qin-Ling Wang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Jie Zhou
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Bao-Rong Wang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Bing Zhang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Kun-Long Wang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, USA
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, USA
| | - Ying-Hua Zhang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China.
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| |
Collapse
|
34
|
Yang D, Wang L, Yuan P, An Q, Su B, Yu M, Chen T, Hu K, Zhang L, Lu Y, Du G. Cocrystal virtual screening based on the XGBoost machine learning model. CHINESE CHEM LETT 2022. [DOI: 10.1016/j.cclet.2022.107964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
35
|
Yaseen A, Amin I, Akhter N, Ben-Hur A, Minhas F. Insights into performance evaluation of compound-protein interaction prediction methods. Bioinformatics 2022; 38:ii75-ii81. [PMID: 36124806 DOI: 10.1093/bioinformatics/btac496] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adiba Yaseen
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Imran Amin
- National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan
| | - Naeem Akhter
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
| | - Fayyaz Minhas
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
36
|
Optimization Method of an Antibreast Cancer Drug Candidate Based on Machine Learning. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4133663. [PMID: 36105244 PMCID: PMC9467812 DOI: 10.1155/2022/4133663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/17/2022] [Accepted: 08/22/2022] [Indexed: 12/03/2022]
Abstract
Breast cancer is a common but serious and even lethal disease. Fortunately, compared with other cancers, breast cancer treatments currently are relatively well developed. The use of specific drugs is typically essential in the majority of breast cancer treatment strategies. Given the aforementioned factors, it is important to continue researching effective antibreast cancer drug design. Machine learning-based computer-aided drug design is currently a common practice in both drug industries and academic institutes. According to the characteristics of breast cancer, we selected multiple candidate compounds; based on the corresponding molecular descriptors, biological activities, and pharmacokinetic properties, a dataset of inhibition potency and pharmacokinetic properties paired with multiple features of compounds was constructed. On this basis, the random forest method was utilized to choose greater-influenced feature embeddings; thus, 224 main operating variables were selected for further analysis; we then employed the efficient MobileNetV3 deep neural network as the backbone to establish the prediction models for the inhibition potency and pharmacokinetic properties of the compounds. After data preprocessing, the weights are obtained by training on the refined dataset. Finally, we define an optimization problem to discover compounds with the best properties. The problem is solved using the genetic algorithm with the acquired prediction model, and the solution value for the corresponding operating variables with the best clinical properties in theory is then obtained. Analysis demonstrates that our approach could be used to aid the screening process of antibreast cancer drug candidates.
Collapse
|
37
|
Mao J, Zeb A, Kim MS, Jeon HN, Wang J, Guan S, No KT. Development of an innovative data-driven system to generate descriptive prediction equation of dielectric constant on small sample sets. Heliyon 2022; 8:e10011. [PMID: 36016529 PMCID: PMC9396556 DOI: 10.1016/j.heliyon.2022.e10011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/13/2022] [Accepted: 07/15/2022] [Indexed: 11/29/2022] Open
Abstract
Dielectric constant (DC, ε) is a fundamental parameter in material sciences to measure polarizability of the system. In industrial processes, its value is an imperative indicator, which demonstrates the dielectric property of material and compiles information including separation information, chemical equilibrium, chemical reactivity analysis, and solubility modeling. Since, the available ε-prediction models are fairly primitive and frequently suffer from serious failures especially when deals with strong polar compounds. Therefore, we have developed a novel data-driven system to improve the efficiency and wide-range applicability of ε using in material sciences. This innovative scheme adopts the correlation distance and genetic algorithm to discriminate features’ combination and avoid overfitting. Herein, the prediction output of the single ML model as a coding to estimate the target value by simulating the layer-by-layer extraction in deep learning, and enabling instant search for the optimal combination of features is recruited. Our model established an improved correlation value of 0.956 with target as compared to the previously available best traditional ML result of 0.877. Our framework established a profound improvement, especially for material systems possessing ε value >50. In terms of interpretability, we have derived a conceptual computational equation from a minimum generating tree. Our innovative data-driven system is preferentially superior over other methods due to its application for the prediction of dielectric constants as well as for the prediction of overall micro and macro-properties of any multi-components complex.
Collapse
|
38
|
Lee J, Song SB, Chung YK, Jang JH, Huh J. BoostSweet: Learning molecular perceptual representations of sweeteners. Food Chem 2022; 383:132435. [PMID: 35182866 DOI: 10.1016/j.foodchem.2022.132435] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 09/16/2021] [Accepted: 02/09/2022] [Indexed: 11/28/2022]
Abstract
The development of safe artificial sweeteners has attracted considerable interest in the food industry. Previous machine learning (ML) studies based on quantitative structure-activity relationships have provided some molecular principles for predicting sweetness, but these models can be improved via the chemical recognition of sweetness active factors. Our ML model, a soft-vote ensemble model that has a light gradient boosting machine and uses both layered fingerprints and alvaDesc molecular descriptor features, demonstrates state-of-the-art performance, with an AUROC score of 0.961. Based on an analysis of feature importance and dataset, we identified that the number of nitrogen atoms that serve as hydrogen bond donors in molecules can play an essential role in determining sweetness. These results potentially provide an advanced understanding of the relationship between molecular structure and sweetness, which can be used to design new sweeteners based on molecular structural dependence.
Collapse
Affiliation(s)
- Junho Lee
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea; SKKU Advanced Institute of Nanotechnology (SAINT), Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Seon Bin Song
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - You Kyoung Chung
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Jee Hwan Jang
- Ucaretron Inc., Anyang 14057, Gyeonggi-do, Republic of Korea; School of Advanced Materials Science and Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| | - Joonsuk Huh
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea; SKKU Advanced Institute of Nanotechnology (SAINT), Sungkyunkwan University, Suwon 16419, Republic of Korea; Institute of Quantum Biophysics, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| |
Collapse
|
39
|
Zhong C, Ai J, Yang Y, Ma F, Sun W. Small Molecular Drug Screening Based on Clinical Therapeutic Effect. Molecules 2022; 27:molecules27154807. [PMID: 35956770 PMCID: PMC9369618 DOI: 10.3390/molecules27154807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/22/2022] [Accepted: 07/25/2022] [Indexed: 11/16/2022] Open
Abstract
Virtual screening can significantly save experimental time and costs for early drug discovery. Drug multi-classification can speed up virtual screening and quickly predict the most likely class for a drug. In this study, 1019 drug molecules with actual therapeutic effects are collected from multiple databases and documents, and molecular sets are grouped according to therapeutic effect and mechanism of action. Molecular descriptors and molecular fingerprints are obtained through SMILES to quantify molecular structures. After using the Kennard–Stone method to divide the data set, a better combination can be obtained by comparing the combined results of five classification algorithms and a fusion method. Furthermore, for a specific data set, the model with the best performance is used to predict the validation data set. The test set shows that prediction accuracy can reach 0.862 and kappa coefficient can reach 0.808. The highest classification accuracy of the validation set is 0.873. The more reliable molecular set has been found, which could be used to predict potential attributes of unknown drug compounds and even to discover new use for old drugs. We hope this research can provide a reference for virtual screening of multiple classes of drugs at the same time in the future.
Collapse
Affiliation(s)
| | | | | | | | - Wei Sun
- Correspondence: ; Tel.: +86-10-64445826
| |
Collapse
|
40
|
Yan J, Rodríguez-Martínez X, Pearce D, Douglas H, Bili D, Azzouzi M, Eisner F, Virbule A, Rezasoltani E, Belova V, Dörling B, Few S, Szumska AA, Hou X, Zhang G, Yip HL, Campoy-Quiles M, Nelson J. Identifying structure-absorption relationships and predicting absorption strength of non-fullerene acceptors for organic photovoltaics. ENERGY & ENVIRONMENTAL SCIENCE 2022; 15:2958-2973. [PMID: 35923416 PMCID: PMC9277517 DOI: 10.1039/d2ee00887d] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 05/20/2022] [Indexed: 06/15/2023]
Abstract
Non-fullerene acceptors (NFAs) are excellent light harvesters, yet the origin of their high optical extinction is not well understood. In this work, we investigate the absorption strength of NFAs by building a database of time-dependent density functional theory (TDDFT) calculations of ∼500 π-conjugated molecules. The calculations are first validated by comparison with experimental measurements in solution and solid state using common fullerene and non-fullerene acceptors. We find that the molar extinction coefficient (ε d,max) shows reasonable agreement between calculation in vacuum and experiment for molecules in solution, highlighting the effectiveness of TDDFT for predicting optical properties of organic π-conjugated molecules. We then perform a statistical analysis based on molecular descriptors to identify which features are important in defining the absorption strength. This allows us to identify structural features that are correlated with high absorption strength in NFAs and could be used to guide molecular design: highly absorbing NFAs should possess a planar, linear, and fully conjugated molecular backbone with highly polarisable heteroatoms. We then exploit a random decision forest algorithm to draw predictions for ε d,max using a computational framework based on extended tight-binding Hamiltonians, which shows reasonable predicting accuracy with lower computational cost than TDDFT. This work provides a general understanding of the relationship between molecular structure and absorption strength in π-conjugated organic molecules, including NFAs, while introducing predictive machine-learning models of low computational cost.
Collapse
Affiliation(s)
- Jun Yan
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Xabier Rodríguez-Martínez
- Electronic and Photonic Materials (EFM), Department of Physics, Chemistry and Biology (IFM), Linköping University Linköping SE 581 83 Sweden
- Instituto de Ciencia de Materiales de Barcelona, ICMAB-CSIC, Campus UAB Bellaterra 08193 Spain
| | - Drew Pearce
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Hana Douglas
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Danai Bili
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Mohammed Azzouzi
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Flurin Eisner
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Alise Virbule
- Department of Physics, Imperial College London SW7 2AZ London UK
| | | | - Valentina Belova
- Instituto de Ciencia de Materiales de Barcelona, ICMAB-CSIC, Campus UAB Bellaterra 08193 Spain
| | - Bernhard Dörling
- Instituto de Ciencia de Materiales de Barcelona, ICMAB-CSIC, Campus UAB Bellaterra 08193 Spain
| | - Sheridan Few
- Department of Physics, Imperial College London SW7 2AZ London UK
- Sustainability Research Institute, School of Earth and Environment, University of Leeds LS2 9JT Leeds UK
| | - Anna A Szumska
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Xueyan Hou
- Department of Physics, Imperial College London SW7 2AZ London UK
| | - Guichuan Zhang
- Institute of Polymer Optoelectronic Materials and Devices, State Key Laboratory of Luminescent Materials and Devices, South China University of Technology Guangzhou 510640 P. R. China
| | - Hin-Lap Yip
- Institute of Polymer Optoelectronic Materials and Devices, State Key Laboratory of Luminescent Materials and Devices, South China University of Technology Guangzhou 510640 P. R. China
- Department of Materials Science and Engineering, City University of Hong Kong, Tat Chee Avenue Kowloon Hong Kong
| | - Mariano Campoy-Quiles
- Instituto de Ciencia de Materiales de Barcelona, ICMAB-CSIC, Campus UAB Bellaterra 08193 Spain
| | - Jenny Nelson
- Department of Physics, Imperial College London SW7 2AZ London UK
| |
Collapse
|
41
|
Chen Z, Liu X, Zhao P, Li C, Wang Y, Li F, Akutsu T, Bain C, Gasser RB, Li J, Yang Z, Gao X, Kurgan L, Song J. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets. Nucleic Acids Res 2022; 50:W434-W447. [PMID: 35524557 PMCID: PMC9252729 DOI: 10.1093/nar/gkac351] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 01/07/2023] Open
Abstract
The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no one-stop computational toolkits that comprehensively characterize a wide range of biomolecules. We address this vital need by developing a holistic platform that generates features from sequence and structural data for a diverse collection of molecule types. Our freely available and easy-to-use iFeatureOmega platform generates, analyzes and visualizes 189 representations for biological sequences, structures and ligands. To the best of our knowledge, iFeatureOmega provides the largest scope when directly compared to the current solutions, in terms of the number of feature extraction and analysis approaches and coverage of different molecules. We release three versions of iFeatureOmega including a webserver, command line interface and graphical interface to satisfy needs of experienced bioinformaticians and less computer-savvy biologists and biochemists. With the assistance of iFeatureOmega, users can encode their molecular data into representations that facilitate construction of predictive models and analytical studies. We highlight benefits of iFeatureOmega based on three research applications, demonstrating how it can be used to accelerate and streamline research in bioinformatics, computational biology, and cheminformatics areas. The iFeatureOmega webserver is freely available at http://ifeatureomega.erc.monash.edu and the standalone versions can be downloaded from https://github.com/Superzchen/iFeatureOmega-GUI/ and https://github.com/Superzchen/iFeatureOmega-CLI/.
Collapse
Affiliation(s)
- Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
- Center for Crop Genome Engineering, Henan Agricultural University, Zhengzhou 450046, China
| | - Xuhan Liu
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Yanan Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Chris Bain
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Zuoren Yang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
42
|
Kuru HI, Tastan O, Cicek AE. MatchMaker: A Deep Learning Framework for Drug Synergy Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2334-2344. [PMID: 34086576 DOI: 10.1109/tcbb.2021.3086702] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Drug combination therapies have been a viable strategy for the treatment of complex diseases such as cancer due to increased efficacy and reduced side effects. However, experimentally validating all possible combinations for synergistic interaction even with high-throughout screens is intractable due to vast combinatorial search space. Computational techniques can reduce the number of combinations to be evaluated experimentally by prioritizing promising candidates. We present MatchMaker that predicts drug synergy scores using drug chemical structure information and gene expression profiles of cell lines in a deep learning framework. For the first time, our model utilizes the largest known drug combination dataset to date, DrugComb. We compare the performance of MatchMaker with the state-of-the-art models and observe up to ∼ 15% correlation and ∼ 33% mean squared error (MSE) improvements over the next best method. We investigate the cell types and drug pairs that are relatively harder to predict and present novel candidate pairs. MatchMaker is built and available at https://github.com/tastanlab/matchmaker.
Collapse
|
43
|
Zapadka M, Dekowski P, Kupcewicz B. HATS5m as an Example of GETAWAY Molecular Descriptor in Assessing the Similarity/Diversity of the Structural Features of 4-Thiazolidinone. Int J Mol Sci 2022; 23:6576. [PMID: 35743020 PMCID: PMC9223869 DOI: 10.3390/ijms23126576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 04/30/2022] [Accepted: 06/10/2022] [Indexed: 11/29/2022] Open
Abstract
Among the various methods for drug design, the approach using molecular descriptors for quantitative structure-activity relationships (QSAR) bears promise for the prediction of innovative molecular structures with bespoke pharmacological activity. Despite the growing number of successful potential applications, the QSAR models often remain hard to interpret. The difficulty arises from the use of advanced chemometric or machine learning methods on the one hand, and the complexity of molecular descriptors on the other hand. Thus, there is a need to interpret molecular descriptors for identifying the features of molecules crucial for desirable activity. For example, the development of structure-activity modeling of different molecule endpoints confirmed the usefulness of H-GETAWAY (H-GEometry, Topology, and Atom-Weights AssemblY) descriptors in molecular sciences. However, compared with other 3D molecular descriptors, H-GETAWAY interpretation is much more complicated. The present study provides insights into the interpretation of the HATS5m descriptor (H-GETAWAY) concerning the molecular structures of the 4-thiazolidinone derivatives with antitrypanosomal activity. According to the published study, an increase in antitrypanosomal activity is associated with both a decrease and an increase in HATS5m (leverage-weighted autocorrelation with lag 5, weighted by atomic masses) values. The substructure-based method explored how the changes in molecular features affect the HATS5m value. Based on this approach, we proposed substituents that translate into low and high HATS5m. The detailed interpretation of H-GETAWAY descriptors requires the consideration of three elements: weighting scheme, leverages, and the Dirac delta function. Particular attention should be paid to the impact of chemical compounds' size and shape and the leverage values of individual atoms.
Collapse
Affiliation(s)
- Mariusz Zapadka
- Department of Inorganic and Analytical Chemistry, Faculty of Pharmacy, Nicolaus Copernicus University in Toruń, Jurasza 2, 85-089 Bydgoszcz, Poland
| | - Przemysław Dekowski
- New Technologies Department, Softmaks.pl Sp. z o.o., Kraszewskiego 1, 85-241 Bydgoszcz, Poland;
| | - Bogumiła Kupcewicz
- Department of Inorganic and Analytical Chemistry, Faculty of Pharmacy, Nicolaus Copernicus University in Toruń, Jurasza 2, 85-089 Bydgoszcz, Poland
| |
Collapse
|
44
|
Malavolta M, Pallante L, Mavkov B, Stojceski F, Grasso G, Korfiati A, Mavroudi S, Kalogeras A, Alexakos C, Martos V, Amoroso D, Di Benedetto G, Piga D, Theofilatos K, Deriu MA. A survey on computational taste predictors. Eur Food Res Technol 2022; 248:2215-2235. [PMID: 35637881 PMCID: PMC9134981 DOI: 10.1007/s00217-022-04044-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/29/2022] [Accepted: 04/30/2022] [Indexed: 11/29/2022]
Abstract
Taste is a sensory modality crucial for nutrition and survival, since it allows the discrimination between healthy foods and toxic substances thanks to five tastes, i.e., sweet, bitter, umami, salty, and sour, associated with distinct nutritional or physiological needs. Today, taste prediction plays a key role in several fields, e.g., medical, industrial, or pharmaceutical, but the complexity of the taste perception process, its multidisciplinary nature, and the high number of potentially relevant players and features at the basis of the taste sensation make taste prediction a very complex task. In this context, the emerging capabilities of machine learning have provided fruitful insights in this field of research, allowing to consider and integrate a very large number of variables and identifying hidden correlations underlying the perception of a particular taste. This review aims at summarizing the latest advances in taste prediction, analyzing available food-related databases and taste prediction tools developed in recent years. Supplementary Information The online version contains supplementary material available at 10.1007/s00217-022-04044-5.
Collapse
Affiliation(s)
- Marta Malavolta
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Lorenzo Pallante
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Bojan Mavkov
- GIPSA-lab, F-38000, Université Grenoble Alpes, Grenoble, France
| | - Filip Stojceski
- Dalle Molle Institute for Artificial Intelligence (IDSIA-USI/SUPSI), Lugano-Viganello, Switzerland
| | - Gianvito Grasso
- Dalle Molle Institute for Artificial Intelligence (IDSIA-USI/SUPSI), Lugano-Viganello, Switzerland
| | | | - Seferina Mavroudi
- InSyBio PC, Patras, Greece
- Department of Nursing, School of Rehabilitation Sciences, University of Patras, Patras, Greece
| | | | - Christos Alexakos
- Athena Research Center, Industrial Systems Institute, Patras, Greece
| | - Vanessa Martos
- Department of Plant Physiology, Institute of Biotechnology, University of Granada, Granada, Spain
| | - Daria Amoroso
- Enginlife Engineering Solutions, Turin, Italy
- 7hc srl, Rome, Italy
| | | | - Dario Piga
- Dalle Molle Institute for Artificial Intelligence (IDSIA-USI/SUPSI), Lugano-Viganello, Switzerland
| | | | - Marco Agostino Deriu
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| |
Collapse
|
45
|
Robles-Loaiza AA, Pinos-Tamayo EA, Mendes B, Ortega-Pila JA, Proaño-Bolaños C, Plisson F, Teixeira C, Gomes P, Almeida JR. Traditional and Computational Screening of Non-Toxic Peptides and Approaches to Improving Selectivity. Pharmaceuticals (Basel) 2022; 15:323. [PMID: 35337121 PMCID: PMC8953747 DOI: 10.3390/ph15030323] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 03/01/2022] [Accepted: 03/04/2022] [Indexed: 12/27/2022] Open
Abstract
Peptides have positively impacted the pharmaceutical industry as drugs, biomarkers, or diagnostic tools of high therapeutic value. However, only a handful have progressed to the market. Toxicity is one of the main obstacles to translating peptides into clinics. Hemolysis or hemotoxicity, the principal source of toxicity, is a natural or disease-induced event leading to the death of vital red blood cells. Initial screenings for toxicity have been widely evaluated using erythrocytes as the gold standard. More recently, many online databases filled with peptide sequences and their biological meta-data have paved the way toward hemolysis prediction using user-friendly, fast-access machine learning-driven programs. This review details the growing contributions of in silico approaches developed in the last decade for the large-scale prediction of erythrocyte lysis induced by peptides. After an overview of the pharmaceutical landscape of peptide therapeutics, we highlighted the relevance of early hemolysis studies in drug development. We emphasized the computational models and algorithms used to this end in light of historical and recent findings in this promising field. We benchmarked seven predictors using peptides from different data sets, having 7-35 amino acids in length. According to our predictions, the models have scored an accuracy over 50.42% and a minimal Matthew's correlation coefficient over 0.11. The maximum values for these statistical parameters achieved 100.0% and 1.00, respectively. Finally, strategies for optimizing peptide selectivity were described, as well as prospects for future investigations. The development of in silico predictive approaches to peptide toxicity has just started, but their important contributions clearly demonstrate their potential for peptide science and computer-aided drug design. Methodology refinement and increasing use will motivate the timely and accurate in silico identification of selective, non-toxic peptide therapeutics.
Collapse
Affiliation(s)
- Alberto A. Robles-Loaiza
- Biomolecules Discovery Group, Universidad Regional Amazónica Ikiam, Tena 150150, Ecuador; (A.A.R.-L.); (B.M.); (J.A.O.-P.); (C.P.-B.)
| | - Edgar A. Pinos-Tamayo
- Escuela Superior Politécnica del Litoral, ESPOL, Centro Nacional de Acuicultura e Investigaciones Marinas (CENAIM), Campus Gustavo Galindo Km. 30, 5 Vía Perimetral, Guayaquil 09-01-5863, Ecuador;
| | - Bruno Mendes
- Biomolecules Discovery Group, Universidad Regional Amazónica Ikiam, Tena 150150, Ecuador; (A.A.R.-L.); (B.M.); (J.A.O.-P.); (C.P.-B.)
| | - Josselyn A. Ortega-Pila
- Biomolecules Discovery Group, Universidad Regional Amazónica Ikiam, Tena 150150, Ecuador; (A.A.R.-L.); (B.M.); (J.A.O.-P.); (C.P.-B.)
| | - Carolina Proaño-Bolaños
- Biomolecules Discovery Group, Universidad Regional Amazónica Ikiam, Tena 150150, Ecuador; (A.A.R.-L.); (B.M.); (J.A.O.-P.); (C.P.-B.)
| | - Fabien Plisson
- Consejo Nacional de Ciencia y Tecnología, Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación Y de Estudios Avanzados del IPN, Irapuato 36824, Mexico;
| | - Cátia Teixeira
- Laboratório Associado para a Química Verde-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, 4169-007 Porto, Portugal; (C.T.); (P.G.)
| | - Paula Gomes
- Laboratório Associado para a Química Verde-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, 4169-007 Porto, Portugal; (C.T.); (P.G.)
| | - José R. Almeida
- Biomolecules Discovery Group, Universidad Regional Amazónica Ikiam, Tena 150150, Ecuador; (A.A.R.-L.); (B.M.); (J.A.O.-P.); (C.P.-B.)
| |
Collapse
|
46
|
Ignacz G, Szekely G. Deep learning meets quantitative structure–activity relationship (QSAR) for leveraging structure-based prediction of solute rejection in organic solvent nanofiltration. J Memb Sci 2022. [DOI: 10.1016/j.memsci.2022.120268] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
47
|
Staszak M, Staszak K, Wieszczycka K, Bajek A, Roszkowski K, Tylkowski B. Machine learning in drug design: Use of artificial intelligence to explore the chemical structure–biological activity relationship. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1568] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Maciej Staszak
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Katarzyna Staszak
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Karolina Wieszczycka
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Anna Bajek
- Department of Tissue Engineering Collegium Medicum, Nicolaus Copernicus University Bydgoszcz Poland
| | - Krzysztof Roszkowski
- Department of Oncology Collegium Medicum Nicolaus Copernicus University Bydgoszcz Poland
| | - Bartosz Tylkowski
- Department of Chemical Engineering University Rovira i Virgili Tarragona Spain
- Eurecat, Centre Tecnològic de Catalunya Chemical Technologies Unit Tarragona Spain
| |
Collapse
|
48
|
An Accurate and Interpretable Deep Learning Model for Environmental Properties Prediction Using Hybrid Molecular Representations. AIChE J 2022. [DOI: 10.1002/aic.17634] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
49
|
Lee M, Min K. A Comparative Study of the Performance for Predicting Biodegradability Classification: The Quantitative Structure-Activity Relationship Model vs the Graph Convolutional Network. ACS OMEGA 2022; 7:3649-3655. [PMID: 35128273 PMCID: PMC8811760 DOI: 10.1021/acsomega.1c06274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/28/2021] [Indexed: 06/14/2023]
Abstract
The prediction and evaluation of the biodegradability of molecules with computational methods are becoming increasingly important. Among the various methods, quantitative structure-activity relationship (QSAR) models have been demonstrated to predict the ready biodegradation of chemicals but have limited functionality owing to their complex implementation. In this study, we employ the graph convolutional network (GCN) method to overcome these issues. A biodegradability dataset from previous studies was trained to generate prediction models by (i) the QSAR models using the Mordred molecular descriptor calculator and MACCS molecular fingerprint and (ii) the GCN model using molecular graphs. The performance comparison of the methods confirms that the GCN model is more straightforward to implement and more stable; the specificity and sensitivity values are almost identical without specific descriptors or fingerprints. In addition, the performance of the models was further verified by randomly dividing the dataset into 100 different cases of training and test sets and by varying the test set ratio from 20 to 80%. The results of the current study clearly suggest the promise of the GCN model, which can be implemented straightforwardly and can replace conventional QSAR prediction models for various types and properties of molecules.
Collapse
Affiliation(s)
- Myeonghun Lee
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
50
|
Jiang P, Chi Y, Li XS, Liu X, Hua XS, Xia K. Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design. Brief Bioinform 2022; 23:6485012. [PMID: 34958660 DOI: 10.1093/bib/bbab527] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/01/2021] [Accepted: 11/14/2021] [Indexed: 01/05/2023] Open
Abstract
Artificial intelligence (AI)-based drug design has great promise to fundamentally change the landscape of the pharmaceutical industry. Even though there are great progress from handcrafted feature-based machine learning models, 3D convolutional neural networks (CNNs) and graph neural networks, effective and efficient representations that characterize the structural, physical, chemical and biological properties of molecular structures and interactions remain to be a great challenge. Here, we propose an equal-sized molecular 2D image representation, known as the molecular persistent spectral image (Mol-PSI), and combine it with CNN model for AI-based drug design. Mol-PSI provides a unique one-to-one image representation for molecular structures and interactions. In general, deep models are empowered to achieve better performance with systematically organized representations in image format. A well-designed parallel CNN architecture for adapting Mol-PSIs is developed for protein-ligand binding affinity prediction. Our results, for the three most commonly used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, are better than all traditional machine learning models, as far as we know. Our Mol-PSI model provides a powerful molecular representation that can be widely used in AI-based drug design and molecular data analysis.
Collapse
Affiliation(s)
- Peiran Jiang
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Ying Chi
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Xiao-Shuang Li
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, 300071, Tianjin, China
| | - Xian-Sheng Hua
- Drug Discovery Intelligence, AI Center, Alibaba Group DAMO Academy, Wen Yi Xi Road, Yuhang District, Hangzhou City , 310000, Zhejiang, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore
| |
Collapse
|