1
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
2
|
Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In recent years, alternative animal testing methods such as computational and machine learning approaches have become increasingly crucial for toxicity testing. However, the complexity and scarcity of available biomedical data challenge the development of predictive models. Combining nonlinear machine learning together with multicondition descriptors offers a solution for using data from various assays to create a robust model. This work applies multicondition descriptors (MCDs) to develop a QSTR (Quantitative Structure-Toxicity Relationship) model based on a large toxicity data set comprising more than 80,000 compounds and 59 different end points (122,572 data points). The prediction capabilities of developed single-task multi-end point machine learning models as well as a novel data analysis approach with the use of Convolutional Neural Networks (CNN) are discussed. The results show that using MCDs significantly improves the model and using them with CNN-1D yields the best result (R2train = 0.93, R2ext = 0.70). Several structural features showed a high level of contribution to the toxicity, including van der Waals surface area (VSA), number of nitrogen-containing fragments (nN+), presence of S-P fragments, ionization potential, and presence of C-N fragments. The developed models can be very useful tools to predict the toxicity of various compounds under different conditions, enabling quick toxicity assessment of new compounds.
Collapse
Affiliation(s)
- Amirreza Daghighi
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Kweeni Iduoku
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Hrvoje Kusic
- Faculty of Chemical Engineering and Technology, University of Zagreb, Marulicev Trg 19, Zagreb 10000, Croatia
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa 48940, Spain
- BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, Leioa 48940, Spain
- IKERBASQUE, Basque Foundation for Science,Bilbao, Biscay 48011, Spain
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| |
Collapse
|
3
|
Walter M, Webb SJ, Gillet VJ. Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features. J Chem Inf Model 2024; 64:3670-3688. [PMID: 38686880 PMCID: PMC11094726 DOI: 10.1021/acs.jcim.4c00127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.
Collapse
Affiliation(s)
- Moritz Walter
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| | - Samuel J. Webb
- Lhasa
Limited, Granary Wharf
House, 2 Canal Wharf, Leeds LS11 5PY, U.K.
| | - Valerie J. Gillet
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| |
Collapse
|
4
|
Shkil DO, Muhamedzhanova AA, Petrov PI, Skorb EV, Aliev TA, Steshin IS, Tumanov AV, Kislinskiy AS, Fedorov MV. Expanding Predictive Capacities in Toxicology: Insights from Hackathon-Enhanced Data and Model Aggregation. Molecules 2024; 29:1826. [PMID: 38675645 PMCID: PMC11055041 DOI: 10.3390/molecules29081826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/11/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
In the realm of predictive toxicology for small molecules, the applicability domain of QSAR models is often limited by the coverage of the chemical space in the training set. Consequently, classical models fail to provide reliable predictions for wide classes of molecules. However, the emergence of innovative data collection methods such as intensive hackathons have promise to quickly expand the available chemical space for model construction. Combined with algorithmic refinement methods, these tools can address the challenges of toxicity prediction, enhancing both the robustness and applicability of the corresponding models. This study aimed to investigate the roles of gradient boosting and strategic data aggregation in enhancing the predictivity ability of models for the toxicity of small organic molecules. We focused on evaluating the impact of incorporating fragment features and expanding the chemical space, facilitated by a comprehensive dataset procured in an open hackathon. We used gradient boosting techniques, accounting for critical features such as the structural fragments or functional groups often associated with manifestations of toxicity.
Collapse
Affiliation(s)
- Dmitrii O. Shkil
- Syntelly LLC, Moscow 121205, Russia; (A.A.M.); (I.S.S.); (A.V.T.); (A.S.K.)
- Moscow Institute of Physics and Technology, Moscow 141700, Russia
| | | | | | - Ekaterina V. Skorb
- Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia; (E.V.S.); (T.A.A.)
| | - Timur A. Aliev
- Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia; (E.V.S.); (T.A.A.)
| | - Ilya S. Steshin
- Syntelly LLC, Moscow 121205, Russia; (A.A.M.); (I.S.S.); (A.V.T.); (A.S.K.)
| | | | | | - Maxim V. Fedorov
- Kharkevich Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow 127994, Russia
| |
Collapse
|
5
|
Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024; 64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]
Abstract
A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.
Collapse
Affiliation(s)
- Emma Svensson
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden
| | - Pieter-Jan Hoedt
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Sepp Hochreiter
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Institute
of Advanced Research in Artificial Intelligence (IARAI), Vienna 1030, Austria
| | - Günter Klambauer
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| |
Collapse
|
6
|
Gustavsson M, Käll S, Svedberg P, Inda-Diaz JS, Molander S, Coria J, Backhaus T, Kristiansson E. Transformers enable accurate prediction of acute and chronic chemical toxicity in aquatic organisms. SCIENCE ADVANCES 2024; 10:eadk6669. [PMID: 38446886 PMCID: PMC10917336 DOI: 10.1126/sciadv.adk6669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/30/2024] [Indexed: 03/08/2024]
Abstract
Environmental hazard assessments are reliant on toxicity data that cover multiple organism groups. Generating experimental toxicity data is, however, resource-intensive and time-consuming. Computational methods are fast and cost-efficient alternatives, but the low accuracy and narrow applicability domains have made their adaptation slow. Here, we present a AI-based model for predicting chemical toxicity. The model uses transformers to capture toxicity-specific features directly from the chemical structures and deep neural networks to predict effect concentrations. The model showed high predictive performance for all tested organism groups-algae, aquatic invertebrates and fish-and has, in comparison to commonly used QSAR methods, a larger applicability domain and a considerably lower error. When the model was trained on data with multiple effect concentrations (EC50/EC10), the performance was further improved. We conclude that deep learning and transformers have the potential to markedly advance computational prediction of chemical toxicity.
Collapse
Affiliation(s)
- Mikael Gustavsson
- Department of Economics, University of Gothenburg, Gothenburg, Sweden
| | - Styrbjörn Käll
- Department of Mathematical Sciences, Chalmers University of Technology/University of Gothenburg, Gothenburg, Sweden
| | - Patrik Svedberg
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Juan S. Inda-Diaz
- Department of Mathematical Sciences, Chalmers University of Technology/University of Gothenburg, Gothenburg, Sweden
| | - Sverker Molander
- Division of Environmental Systems Analysis, Department of Technology Management and Economics, Chalmers University of Technology, Gothenburg, Sweden
| | - Jessica Coria
- Department of Economics, University of Gothenburg, Gothenburg, Sweden
| | - Thomas Backhaus
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology/University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
7
|
Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024; 29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
Collapse
Affiliation(s)
- Andrea Hunklinger
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Peter Hartog
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Martin Šícho
- Leiden Academic Centre for Drug Research, Leiden University, 55 Einsteinweg, 2333 CC Leiden, the Netherlands; CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Guillaume Godin
- dsm-firmenich SA, Rue de la Bergère 7, CH-1242 Satigny, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, DE-85716 Unterschleißheim, Germany.
| |
Collapse
|
8
|
Banerjee A, Roy K. Read-across-based intelligent learning: development of a global q-RASAR model for the efficient quantitative predictions of skin sensitization potential of diverse organic chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2023; 25:1626-1644. [PMID: 37682520 DOI: 10.1039/d3em00322a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Environmental chemicals and contaminants cause a wide array of harmful implications to terrestrial and aquatic life which ranges from skin sensitization to acute oral toxicity. The current study aims to assess the quantitative skin sensitization potential of a large set of industrial and environmental chemicals acting through different mechanisms using the novel quantitative Read-Across Structure-Activity Relationship (q-RASAR) approach. Based on the identified important set of structural and physicochemical features, Read-Across-based hyperparameters were optimized using the training set compounds followed by the calculation of similarity and error-based RASAR descriptors. Data fusion, further feature selection, and removal of prediction confidence outliers were performed to generate a partial least squares (PLS) q-RASAR model, followed by the application of various Machine Learning (ML) tools to check the quality of predictions. The PLS model was found to be the best among different models. A simple user-friendly Java-based software tool was developed based on the PLS model, which efficiently predicts the toxicity value(s) of query compound(s) along with their status of Applicability Domain (AD) in terms of leverage values. This model has been developed using structurally diverse compounds and is expected to predict efficiently and quantitatively the skin sensitization potential of environmental chemicals to estimate their occupational and health hazards.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
9
|
Viljanen M, Minnema J, Wassenaar PNH, Rorije E, Peijnenburg W. What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:765-788. [PMID: 37670728 DOI: 10.1080/1062936x.2023.2254225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.
Collapse
Affiliation(s)
- M Viljanen
- Department of Statistics, Data Science and Modelling, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - J Minnema
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - P N H Wassenaar
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - E Rorije
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - W Peijnenburg
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
- Institute of Environmental Sciences (CML), Leiden University, Leiden, The Netherlands
| |
Collapse
|
10
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
11
|
Patlewicz G, Paul-Friedman K, Houck K, Zhang L, Huang R, Xia M, Brown J, Simmons SO. Evaluating the utility of a high throughput thiol-containing fluorescent probe to screen for reactivity: A case study with the Tox21 library. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2023; 26:10.1016/j.comtox.2023.100271. [PMID: 37388277 PMCID: PMC10304587 DOI: 10.1016/j.comtox.2023.100271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
High-throughput screening (HTS) assays for bioactivity in the Tox21 program aim to evaluate an array of different biological targets and pathways, but a significant barrier to interpretation of these data is the lack of high-throughput screening (HTS) assays intended to identify non-specific reactive chemicals. This is an important aspect for prioritising chemicals to test in specific assays, identifying promiscuous chemicals based on their reactivity, as well as addressing hazards such as skin sensitisation which are not necessarily initiated by a receptor-mediated effect but act through a non-specific mechanism. Herein, a fluorescence-based HTS assay that allows the identification of thiol-reactive compounds was used to screen 7,872 unique chemicals in the Tox21 10K chemical library. Active chemicals were compared with profiling outcomes using structural alerts encoding electrophilic information. Random Forest classification models based on chemical fingerprints were developed to predict assay outcomes and evaluated through 10-fold stratified cross validation (CV). The mean CV Balanced Accuracy of the validation set was 0.648. The model developed shows promise as a tool to screen untested chemicals for their potential electrophilic reactivity based solely on chemical structural features.
Collapse
Affiliation(s)
- Grace Patlewicz
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Katie Paul-Friedman
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Keith Houck
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Li Zhang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Menghang Xia
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Jason Brown
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Steven O. Simmons
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| |
Collapse
|
12
|
Sharma B, Chenthamarakshan V, Dhurandhar A, Pereira S, Hendler JA, Dordick JS, Das P. Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations. Sci Rep 2023; 13:4908. [PMID: 36966203 PMCID: PMC10039880 DOI: 10.1038/s41598-023-31169-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 03/07/2023] [Indexed: 03/27/2023] Open
Abstract
Explainable machine learning for molecular toxicity prediction is a promising approach for efficient drug development and chemical safety. A predictive ML model of toxicity can reduce experimental cost and time while mitigating ethical concerns by significantly reducing animal and clinical testing. Herein, we use a deep learning framework for simultaneously modeling in vitro, in vivo, and clinical toxicity data. Two different molecular input representations are used; Morgan fingerprints and pre-trained SMILES embeddings. A multi-task deep learning model accurately predicts toxicity for all endpoints, including clinical, as indicated by the area under the Receiver Operator Characteristic curve and balanced accuracy. In particular, pre-trained molecular SMILES embeddings as input to the multi-task model improved clinical toxicity predictions compared to existing models in MoleculeNet benchmark. Additionally, our multitask approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms. Through both the multi-task model and transfer learning, we were able to indicate the minimal need of in vivo data for clinical toxicity predictions. To provide confidence and explain the model's predictions, we adapt a post-hoc contrastive explanation method that returns pertinent positive and negative features, which correspond well to known mutagenic and reactive toxicophores, such as unsubstituted bonded heteroatoms, aromatic amines, and Michael receptors. Furthermore, toxicophore recovery by pertinent feature analysis captures more of the in vitro (53%) and in vivo (56%), rather than of the clinical (8%), endpoints, and indeed uncovers a preference in known toxicophore data towards in vitro and in vivo experimental data. To our knowledge, this is the first contrastive explanation, using both present and absent substructures, for predictions of clinical and in vivo molecular toxicity.
Collapse
Affiliation(s)
| | | | | | - Shiranee Pereira
- ICARE, International Center for Alternatives in Research and Education, Chennai, India
| | | | | | - Payel Das
- IBM Research, Yorktown Heights, NY, USA.
| |
Collapse
|
13
|
Sosnina EA, Sosnin S, Fedorov MV. Improvement of multi-task learning by data enrichment: application for drug discovery. J Comput Aided Mol Des 2023; 37:183-200. [PMID: 36943645 DOI: 10.1007/s10822-023-00500-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/21/2023] [Indexed: 03/23/2023]
Abstract
Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026.
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1190, Vienna, Austria
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026
- Sirius University of Science and Technology, Olympiisky Prospect 1, Sochi, Russia, 354340
| |
Collapse
|
14
|
Ksenofontov AA, Isaev YI, Lukanov MM, Makarov DM, Eventova VA, Khodov IA, Berezin MB. Accurate prediction of 11B NMR chemical shift of BODIPYs via machine learning. Phys Chem Chem Phys 2023; 25:9472-9481. [PMID: 36935644 DOI: 10.1039/d3cp00253e] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
In this article, we present the results of developing a model based on an RFR machine learning method using the ISIDA fragment descriptors for predicting the 11B NMR chemical shift of BODIPYs. The model is freely available at https://ochem.eu/article/146458. The model demonstrates the high quality of predicting the 11B NMR chemical shift (RMSE, 5CV (FINALE training set) = 0.40 ppm, RMSE (TEST set) = 0.14 ppm). In addition, we compared the "cost" and the user-friendliness for calculations using the quantum-chemical model with the DFT/GIAO approach. The 11B NMR chemical shift prediction accuracy (RMSE) of the model considered is more than three times higher and tremendously faster than the DFT/GIAO calculations. As a result, we provide a convenient tool and database that we collected for all researchers, that allows them to predict the 11B NMR chemical shift of boron-containing dyes. We believe that the new model will make it easier for researchers to correctly interpret the 11B NMR chemical shifts experimentally determined and to select more optimal conditions to perform an NMR experiment.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Yaroslav I Isaev
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Dmitry M Makarov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Varvara A Eventova
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Ilya A Khodov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Mechail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| |
Collapse
|
15
|
pH-dependent solubility prediction for optimized drug absorption and compound uptake by plants. J Comput Aided Mol Des 2023; 37:129-145. [PMID: 36797399 DOI: 10.1007/s10822-023-00496-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 01/31/2023] [Indexed: 02/18/2023]
Abstract
Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes - acidic, neutral and basic - , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.
Collapse
|
16
|
Wu L, Yan B, Han J, Li R, Xiao J, He S, Bo X. TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res 2022; 51:D1432-D1445. [PMID: 36400569 PMCID: PMC9825425 DOI: 10.1093/nar/gkac1074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 10/10/2022] [Accepted: 10/26/2022] [Indexed: 11/20/2022] Open
Abstract
The toxic effects of compounds on environment, humans, and other organisms have been a major focus of many research areas, including drug discovery and ecological research. Identifying the potential toxicity in the early stage of compound/drug discovery is critical. The rapid development of computational methods for evaluating various toxicity categories has increased the need for comprehensive and system-level collection of toxicological data, associated attributes, and benchmarks. To contribute toward this goal, we proposed TOXRIC (https://toxric.bioinforai.tech/), a database with comprehensive toxicological data, standardized attribute data, practical benchmarks, informative visualization of molecular representations, and an intuitive function interface. The data stored in TOXRIC contains 113 372 compounds, 13 toxicity categories, 1474 toxicity endpoints covering in vivo/in vitro endpoints and 39 feature types, covering structural, target, transcriptome, metabolic data, and other descriptors. All the curated datasets of endpoints and features can be retrieved, downloaded and directly used as output or input to Machine Learning (ML)-based prediction models. In addition to serving as a data repository, TOXRIC also provides visualization of benchmarks and molecular representations for all endpoint datasets. Based on these results, researchers can better understand and select optimal feature types, molecular representations, and baseline algorithms for each endpoint prediction task. We believe that the rich information on compound toxicology, ML-ready datasets, benchmarks and molecular representation distribution can greatly facilitate toxicological investigations, interpretation of toxicological mechanisms, compound/drug discovery and the development of computational methods.
Collapse
Affiliation(s)
| | | | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Ruijiang Li
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Jian Xiao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China,Institute for Rational and Safe Medication Practices, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Song He
- Correspondence may also be addressed to Song He. Tel: +86 01066931450;
| | - Xiaochen Bo
- To whom correspondence should be addressed. Tel: +86 01066931207; ;
| |
Collapse
|
17
|
|
18
|
Umemori Y, Handa K, Sakamoto S, Kageyama M, Iijima T. QSAR model to predict K p,uu,brain with a small dataset, incorporating predicted values of related parameter. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:885-897. [PMID: 36420623 DOI: 10.1080/1062936x.2022.2149619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
The unbound brain-to-plasma concentration ratio (Kp,uu,brain) is a parameter that indicates the extent of central nervous system penetration. Pharmaceutical companies build prediction models because many experiments are required to obtain Kp,uu,brain. However, the lack of data hinders the design of an accurate prediction model. To construct a quantitative structure-activity relationship (QSAR) model with a small dataset of Kp,uu,brain, we investigated whether the prediction accuracy could be improved by incorporating software-predicted brain penetration-related parameters (BPrPs) as explanatory variables for pharmacokinetic parameter prediction. We collected 88 compounds with experimental Kp,uu,brain from various official publications. Random forest was used as the machine learning model. First, we developed prediction models using only structural descriptors. Second, we verified the predictive accuracy of each model with the predicted values of BPrPs incorporated in various combinations. Third, the Kp,uu,brain of the in-house compounds was predicted and compared with the experimental values. The prediction accuracy was improved using five-fold cross-validation (RMSE = 0.455, r2 = 0.726) by incorporating BPrPs. Additionally, this model was verified using an external in-house dataset. The result suggested that using BPrPs as explanatory variables improve the prediction accuracy of the Kp,uu,brain QSAR model when the available number of datasets is small.
Collapse
Affiliation(s)
- Y Umemori
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| | - K Handa
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| | - S Sakamoto
- Pharmaceutical Development Coordination Department, Teijin Pharma Limited, Chiyoda-ku, Japan
| | - M Kageyama
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| | - T Iijima
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| |
Collapse
|
19
|
Ksenofontov AA, Lukanov MM, Bocharov PS. Can machine learning methods accurately predict the molar absorption coefficient of different classes of dyes? SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 279:121442. [PMID: 35660154 DOI: 10.1016/j.saa.2022.121442] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 05/25/2022] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
In this article, we provide a convenient tool for all researchers to predict the value of the molar absorption coefficient for a wide number of dyes without any computer costs. The new model is based on RFR method (ALogPS, OEstate + Fragmentor + QNPR) and is able to predict the molar absorption coefficient with an accuracy (5-fold cross-validation RMSE) of 0.26 log unit. This accuracy was achieved due to the fact that the model was trained on data for more than 20,000 unique dye molecules. To our knowledge, this is the first model for predicting the molar absorption coefficient trained on such a large and diverse set of dyes. The model is available at https://ochem.eu/article/145413. We hope that the new model will allow researchers to predict dyes with practically significant spectral characteristics and verify existing experimental data.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Pavel S Bocharov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia
| |
Collapse
|
20
|
Identification of Potential Insect Growth Inhibitor against Aedes aegypti: A Bioinformatics Approach. Int J Mol Sci 2022; 23:ijms23158218. [PMID: 35897792 PMCID: PMC9332482 DOI: 10.3390/ijms23158218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/30/2022] [Accepted: 07/11/2022] [Indexed: 02/04/2023] Open
Abstract
Aedes aegypti is the main vector that transmits viral diseases such as dengue, hemorrhagic dengue, urban yellow fever, zika, and chikungunya. Worldwide, many cases of dengue have been reported in recent years, showing significant growth. The best way to manage diseases transmitted by Aedes aegypti is to control the vector with insecticides, which have already been shown to be toxic to humans; moreover, insects have developed resistance. Thus, the development of new insecticides is considered an emergency. One way to achieve this goal is to apply computational methods based on ligands and target information. In this study, sixteen compounds with acceptable insecticidal activities, with 100% larvicidal activity at low concentrations (2.0 to 0.001 mg·L−1), were selected from the literature. These compounds were used to build up and validate pharmacophore models. Pharmacophore model 6 (AUC = 0.78; BEDROC = 0.6) was used to filter 4793 compounds from the subset of lead-like compounds from the ZINC database; 4142 compounds (dG < 0 kcal/mol) were then aligned to the active site of the juvenile hormone receptor Aedes aegypti (PDB: 5V13), 2240 compounds (LE < −0.40 kcal/mol) were prioritized for molecular docking from the construction of a chitin deacetylase model of Aedes aegypti by the homology modeling of the Bombyx mori species (PDB: 5ZNT), which aligned 1959 compounds (dG < 0 kcal/mol), and 20 compounds (LE < −0.4 kcal/mol) were predicted for pharmacokinetic and toxicological prediction in silico (Preadmet, SwissADMET, and eMolTox programs). Finally, the theoretical routes of compounds M01, M02, M03, M04, and M05 were proposed. Compounds M01−M05 were selected, showing significant differences in pharmacokinetic and toxicological parameters in relation to positive controls and interaction with catalytic residues among key protein sites reported in the literature. For this reason, the molecules investigated here are dual inhibitors of the enzymes chitin synthase and juvenile hormonal protein from insects and humans, characterizing them as potential insecticides against the Aedes aegypti mosquito.
Collapse
|
21
|
Nkulikiyinka P, Wagland ST, Manovic V, Clough PT. Prediction of Combined Sorbent and Catalyst Materials for SE-SMR, Using QSPR and Multitask Learning. Ind Eng Chem Res 2022; 61:9218-9233. [PMID: 35818477 PMCID: PMC9264356 DOI: 10.1021/acs.iecr.2c00971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The process of sorption
enhanced steam methane reforming (SE-SMR)
is an emerging technology for the production of low carbon hydrogen.
The development of a suitable catalytic material, as well as a CO2 adsorbent with high capture capacity, has slowed the upscaling
of this process to date. In this study, to aid the development of
a combined sorbent catalyst material (CSCM) for SE-SMR, a novel approach
involving quantitative structure–property relationship analysis
(QSPR) has been proposed. Through data-mining, two databases have
been developed for the prediction of the last cycle capacity (gCO2/gsorbent) and methane conversion
(%). Multitask learning (MTL) was applied for the prediction of CSCM
properties. Patterns in the data of this study have also yielded further
insights; colored scatter plots were able to show certain patterns
in the input data, as well as suggestions on how to develop an optimal
material. With the results from the actual vs predicted plots collated,
raw materials and synthesis conditions were proposed that could lead
to the development of a CSCM that has good performance with respect
to both the last cycle capacity and the methane conversion.
Collapse
Affiliation(s)
- Paula Nkulikiyinka
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| | - Stuart T. Wagland
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| | - Vasilije Manovic
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| | - Peter T. Clough
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| |
Collapse
|
22
|
Jeong J, Choi J. Artificial Intelligence-Based Toxicity Prediction of Environmental Chemicals: Future Directions for Chemical Management Applications. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7532-7543. [PMID: 35666838 DOI: 10.1021/acs.est.1c07413] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recently, research on the development of artificial intelligence (AI)-based computational toxicology models that predict toxicity without the use of animal testing has emerged because of the rapid development of computer technology. Various computational toxicology techniques that predict toxicity based on the structure of chemical substances are gaining attention, including the quantitative structure-activity relationship. To understand the recent development of these models, we analyzed the databases, molecular descriptors, fingerprints, and algorithms considered in recent studies. Based on a selection of 96 papers published since 2014, we found that AI models have been developed to predict approximately 30 different toxicity end points using more than 20 toxicity databases. For model development, molecular access system and extended-connectivity fingerprints are the most commonly used molecular descriptors. The most used algorithm among the machine learning techniques is the random forest, while the most used algorithm among the deep learning techniques is a deep neural network. The use of AI technology in the development of toxicity prediction models is a new concept that will aid in achieving a scientific accord and meet regulatory applications. The comprehensive overview provided in this study will provide a useful guide for the further development and application of toxicity prediction models.
Collapse
Affiliation(s)
- Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| |
Collapse
|
23
|
Walter M, Allen LN, de la Vega de León A, Webb SJ, Gillet VJ. Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction. J Cheminform 2022; 14:32. [PMID: 35672779 PMCID: PMC9172131 DOI: 10.1186/s13321-022-00611-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/12/2022] [Indexed: 11/21/2022] Open
Abstract
Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.
Collapse
|
24
|
Baskin I, Epshtein A, Ein-Eli Y. Benchmarking machine learning methods for modeling physical properties of ionic liquids. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.118616] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
25
|
Ksenofontov AA, Lukanov MM, Bocharov PS, Berezin MB, Tetko IV. Deep neural network model for highly accurate prediction of BODIPYs absorption. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 267:120577. [PMID: 34776377 DOI: 10.1016/j.saa.2021.120577] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/12/2021] [Accepted: 10/31/2021] [Indexed: 06/13/2023]
Abstract
A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (including 4000-plus BODIPYs) and the deep neural network architecture. The high prediction accuracy (five-fold cross-validation room mean squared error (RMSE) of 18.4 nm) was obtained using a consensus model, which was more accurate than individual models. This model provided the excellent accuracy (RMSE of 8 nm) for molecules previously synthesized in our laboratory as well as for prospective validation of three new BODIPYs. We found that solvent properties did not significantly influence the model accuracy since only few BODIPYs exhibited solvatochromism. The analysis of large prediction errors suggested that compounds able to have intermolecular interactions with solvent or salts were likely to be incorrectly predicted. The consensus model is freely available at https://ochem.eu/article/134921 and can help the other researchers to accelerate design of new dyes with desired properties.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Pavel S Bocharov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia
| | - Igor V Tetko
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Helmholtz Zentrum München‑German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany
| |
Collapse
|
26
|
Abstract
Quantitative structure-activity relationship (QSAR) models are routinely applied computational tools in the drug discovery process. QSAR models are regression or classification models that predict the biological activities of molecules based on the features derived from their molecular structures. These models are usually used to prioritize a list of candidate molecules for future laboratory experiments and to help chemists gain better insights into how structural changes affect a molecule's biological activities. Developing accurate and interpretable QSAR models is therefore of the utmost importance in the drug discovery process. Deep neural networks, which are powerful supervised learning algorithms, have shown great promise for addressing regression and classification problems in various research fields, including the pharmaceutical industry. In this chapter, we briefly review the applications of deep neural networks in QSAR modeling and describe commonly used techniques to improve model performance.
Collapse
|
27
|
Feinstein J, Sivaraman G, Picel K, Peters B, Vázquez-Mayagoitia Á, Ramanathan A, MacDonell M, Foster I, Yan E. Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity. J Chem Inf Model 2021; 61:5793-5803. [PMID: 34905348 DOI: 10.1021/acs.jcim.1c01204] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.
Collapse
Affiliation(s)
- Jeremy Feinstein
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ganesh Sivaraman
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Kurt Picel
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Brian Peters
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | | | - Arvind Ramanathan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Margaret MacDonell
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Eugene Yan
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
28
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
29
|
Multi-Target In Silico Prediction of Inhibitors for Mitogen-Activated Protein Kinase-Interacting Kinases. Biomolecules 2021; 11:biom11111670. [PMID: 34827668 PMCID: PMC8615736 DOI: 10.3390/biom11111670] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 11/05/2021] [Accepted: 11/08/2021] [Indexed: 11/26/2022] Open
Abstract
The inhibitors of two isoforms of mitogen-activated protein kinase-interacting kinases (i.e., MNK-1 and MNK-2) are implicated in the treatment of a number of diseases including cancer. This work reports, for the first time, a multi-target (or multi-tasking) in silico modeling approach (mt-QSAR) for probing the inhibitory potential of these isoforms against MNKs. Linear and non-linear mt-QSAR classification models were set up from a large dataset of 1892 chemicals tested under a variety of assay conditions, based on the Box–Jenkins moving average approach, along with a range of feature selection algorithms and machine learning tools, out of which the most predictive one (>90% overall accuracy) was used for mechanistic interpretation of the likely inhibition of MNK-1 and MNK-2. Considering that the latter model is suitable for virtual screening of chemical libraries—i.e., commercial, non-commercial and in-house sets, it was made publicly accessible as a ready-to-use FLASK-based application. Additionally, this work employed a focused kinase library for virtual screening using an mt-QSAR model. The virtual hits identified in this process were further filtered by using a similarity search, in silico prediction of drug-likeness, and ADME profiles as well as synthetic accessibility tools. Finally, molecular dynamic simulations were carried out to identify and select the most promising virtual hits. The information gathered from this work can supply important guidelines for the discovery of novel MNK-1/2 inhibitors as potential therapeutic agents.
Collapse
|
30
|
Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV. Highly Accurate Filters to Flag Frequent Hitters in AlphaScreen Assays by Suggesting their Mechanism. Mol Inform 2021; 41:e2100151. [PMID: 34676998 DOI: 10.1002/minf.202100151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 09/29/2021] [Indexed: 11/06/2022]
Abstract
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds. The development of such filters is time consuming and requires deep domain knowledge. Recently, machine learning and artificial intelligence methods are emerging as important tools to advance drug discovery and chemoinformatics, including their application to identification of frequent hitters in screening assays. However, the relative performance and complementarity of the Machine Learning and scaffold-based techniques has not yet been comprehensively compared. In this study, we analysed filters based on the privileged scaffolds with filters built using machine learning. Our results demonstrate that machine-learning methods provide more accurate filters for identification of frequent hitters in AlphaScreen assays than scaffold-based methods and can be easily redeveloped once new data are measured. We present highly accurate models to identify frequent hitters in AlphaScreen assays.
Collapse
Affiliation(s)
- Dipan Ghosh
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Kamyar Hadian
- Assay Development and Screening Platform, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Michael Sattler
- Bavarian NMR Center, Department Chemie, Technische Universität München, Ernst-Otto-Fischerstraße 2, D-85747, Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.,G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street 1, 153045, Ivanovo, Russia.,BIGCHEM GmbH, Valerystr. 49, D-85716, Unterschleißheim, Germany
| |
Collapse
|
31
|
Wang Y, Wang B, Jiang J, Guo J, Lai J, Lian XY, Wu J. Multitask CapsNet: An Imbalanced Data Deep Learning Method for Predicting Toxicants. ACS OMEGA 2021; 6:26545-26555. [PMID: 34661009 PMCID: PMC8515573 DOI: 10.1021/acsomega.1c03842] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 09/14/2021] [Indexed: 05/17/2023]
Abstract
Drug development has a high failure rate, with safety properties constituting a considerable challenge. To reduce risk, in silico tools, including various machine learning methods, have been applied for toxicity prediction. However, these approaches often confront a serious problem: the training data sets are usually biased (imbalanced positive and negative samples), which would result in model training difficulty and unsatisfactory prediction accuracy. Multitask networks obtained significantly better predictive accuracies than single-task methods, and capsule neural networks showed excellent performance in sparse data sets in previous studies. In this study, we developed a new multitask framework based on a capsule neural network (multitask CapsNet) to measure 12 different toxic effects simultaneously. We found that multitask CapsNet excelled in toxicity prediction and outperformed many other computational approaches using the multitask strategy. Only after training on biased data sets did multitask CapsNet achieve significantly improved prediction accuracy on the Tox21 Data Challenge, which gave the largest ratio of highest accuracy (8/12) among compared models. Our model gave a prediction accuracy of 96.6% for the target NR.PPAR.gamma, whose ratio of negative to positive samples was up to 36:1. These results suggested that multitask CapsNet could overcome the bias problems and would provide a novel, accurate, and efficient approach for predicting the toxicities of compounds.
Collapse
Affiliation(s)
- Yiwei Wang
- School
of Preclinical Medicine, Southwest Medical
University, Luzhou 646000, China
| | - Binyou Wang
- School
of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Jie Jiang
- School
of Preclinical Medicine, Southwest Medical
University, Luzhou 646000, China
| | - Jianmin Guo
- School
of Preclinical Medicine, Southwest Medical
University, Luzhou 646000, China
| | - Jia Lai
- School
of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Xiao-Yuan Lian
- School
of Pharmacy, Zhejiang University, Hangzhou 310011, China
| | - Jianming Wu
- Key
Laboratory of Medical Electrophysiology, Ministry of Education of
China, Medical Key Laboratory for Drug Discovery and Druggability
Evaluation of Sichuan Province, Luzhou Key
Laboratory of Activity Screening and Druggability Evaluation for Chinese
Materia Medica, Luzhou 646000, China
| |
Collapse
|
32
|
Kuz’min V, Artemenko A, Ognichenko L, Hromov A, Kosinskaya A, Stelmakh S, Sessions ZL, Muratov EN. Simplex representation of molecular structure as universal QSAR/QSPR tool. Struct Chem 2021; 32:1365-1392. [PMID: 34177203 PMCID: PMC8218296 DOI: 10.1007/s11224-021-01793-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/07/2021] [Indexed: 10/24/2022]
Abstract
We review the development and application of the Simplex approach for the solution of various QSAR/QSPR problems. The general concept of the simplex method and its varieties are described. The advantages of utilizing this methodology, especially for the interpretation of QSAR/QSPR models, are presented in comparison to other fragmentary methods of molecular structure representation. The utility of SiRMS is demonstrated not only in the standard QSAR/QSPR applications, but also for mixtures, polymers, materials, and other complex systems. In addition to many different types of biological activity (antiviral, antimicrobial, antitumor, psychotropic, analgesic, etc.), toxicity and bioavailability, the review examines the simulation of important properties, such as water solubility, lipophilicity, as well as luminescence, and thermodynamic properties (melting and boiling temperatures, critical parameters, etc.). This review focuses on the stereochemical description of molecules within the simplex approach and details the possibilities of universal molecular stereo-analysis and stereochemical configuration description, along with stereo-isomerization mechanism and molecular fragment "topography" identification.
Collapse
Affiliation(s)
- Victor Kuz’min
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Anatoly Artemenko
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Luidmyla Ognichenko
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Alexander Hromov
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Anna Kosinskaya
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
- Department of Medical Chemistry, Odessa National Medical University, Odessa, 65082 Ukraine
| | - Sergij Stelmakh
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Zoe L. Sessions
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599 USA
- Department of Pharmaceutical Sciences, Federal University of Paraiba, Joao Pessoa, PB 58059 Brazil
| |
Collapse
|
33
|
Jiménez-Luna J, Grisoni F, Weskamp N, Schneider G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin Drug Discov 2021; 16:949-959. [PMID: 33779453 DOI: 10.1080/17460441.2021.1909567] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Introduction: Artificial intelligence (AI) has inspired computer-aided drug discovery. The widespread adoption of machine learning, in particular deep learning, in multiple scientific disciplines, and the advances in computing hardware and software, among other factors, continue to fuel this development. Much of the initial skepticism regarding applications of AI in pharmaceutical discovery has started to vanish, consequently benefitting medicinal chemistry.Areas covered: The current status of AI in chemoinformatics is reviewed. The topics discussed herein include quantitative structure-activity/property relationship and structure-based modeling, de novo molecular design, and chemical synthesis prediction. Advantages and limitations of current deep learning applications are highlighted, together with a perspective on next-generation AI for drug discovery.Expert opinion: Deep learning-based approaches have only begun to address some fundamental problems in drug discovery. Certain methodological advances, such as message-passing models, spatial-symmetry-preserving networks, hybrid de novo design, and other innovative machine learning paradigms, will likely become commonplace and help address some of the most challenging questions. Open data sharing and model development will play a central role in the advancement of drug discovery with AI.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Francesca Grisoni
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
34
|
Jiang D, Wu Z, Hsieh CY, Chen G, Liao B, Wang Z, Shen C, Cao D, Wu J, Hou T. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 2021; 13:12. [PMID: 33597034 PMCID: PMC7888189 DOI: 10.1186/s13321-020-00479-8] [Citation(s) in RCA: 162] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/26/2020] [Indexed: 12/31/2022] Open
Abstract
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.![]()
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Guangyong Chen
- Shenzhen Institutes of Advanced Technology, Shenzhen, 518055, Guangdong, China
| | - Ben Liao
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, China.
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
35
|
Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021; 61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vinicius M Alves
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Nicole Kleinstreuer
- Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
36
|
Wu L, Huang R, Tetko IV, Xia Z, Xu J, Tong W. Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets. Chem Res Toxicol 2021; 34:541-549. [PMID: 33513003 DOI: 10.1021/acs.chemrestox.0c00373] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Ruili Huang
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.,BIGCHEM GmbH, Valerystraße 49, DE-85716 Unterschleißheim, Germany
| | - Zhonghua Xia
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| |
Collapse
|
37
|
Semenyuta IV, Trush MM, Kovalishyn VV, Rogalsky SP, Hodyna DM, Karpov P, Xia Z, Tetko IV, Metelytsia LO. Structure-Activity Relationship Modeling and Experimental Validation of the Imidazolium and Pyridinium Based Ionic Liquids as Potential Antibacterials of MDR Acinetobacter Baumannii and Staphylococcus Aureus. Int J Mol Sci 2021; 22:ijms22020563. [PMID: 33429999 PMCID: PMC7827895 DOI: 10.3390/ijms22020563] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 12/29/2020] [Accepted: 01/05/2021] [Indexed: 12/31/2022] Open
Abstract
Online Chemical Modeling Environment (OCHEM) was used for QSAR analysis of a set of ionic liquids (ILs) tested against multi-drug resistant (MDR) clinical isolate Acinetobacter baumannii and Staphylococcus aureus strains. The predictive accuracy of regression models has coefficient of determination q2 = 0.66 - 0.79 with cross-validation and independent test sets. The models were used to screen a virtual chemical library of ILs, which was designed with targeted activity against MDR Acinetobacter baumannii and Staphylococcus aureus strains. Seven most promising ILs were selected, synthesized, and tested. Three ILs showed high activity against both these MDR clinical isolates.
Collapse
Affiliation(s)
- Ivan V. Semenyuta
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660 Kyiv, Ukraine; (I.V.S.); (M.M.T.); (V.V.K.); (S.P.R.); (D.M.H.); (L.O.M.)
| | - Maria M. Trush
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660 Kyiv, Ukraine; (I.V.S.); (M.M.T.); (V.V.K.); (S.P.R.); (D.M.H.); (L.O.M.)
| | - Vasyl V. Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660 Kyiv, Ukraine; (I.V.S.); (M.M.T.); (V.V.K.); (S.P.R.); (D.M.H.); (L.O.M.)
| | - Sergiy P. Rogalsky
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660 Kyiv, Ukraine; (I.V.S.); (M.M.T.); (V.V.K.); (S.P.R.); (D.M.H.); (L.O.M.)
| | - Diana M. Hodyna
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660 Kyiv, Ukraine; (I.V.S.); (M.M.T.); (V.V.K.); (S.P.R.); (D.M.H.); (L.O.M.)
| | - Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München—German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany; (P.K.); (Z.X.)
| | - Zhonghua Xia
- Institute of Structural Biology, Helmholtz Zentrum München—German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany; (P.K.); (Z.X.)
| | - Igor V. Tetko
- Institute of Structural Biology, Helmholtz Zentrum München—German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany; (P.K.); (Z.X.)
- BIGCHEM GmbH, Unterschleißheim, Valerystr. 49, D-85716 Neuherberg, Germany
- Correspondence: ; Tel.: +49-89-3187-3575
| | - Larisa O. Metelytsia
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660 Kyiv, Ukraine; (I.V.S.); (M.M.T.); (V.V.K.); (S.P.R.); (D.M.H.); (L.O.M.)
| |
Collapse
|
38
|
Antelo-Collado A, Carrasco-Velar R, García-Pedrajas N, Cerruela-García G. Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction. J Chem Inf Model 2020; 61:76-94. [PMID: 33350301 DOI: 10.1021/acs.jcim.0c00908] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
During the drug development process, it is common to carry out toxicity tests and adverse effect studies, which are essential to guarantee patient safety and the success of the research. The use of in silico quantitative structure-activity relationship (QSAR) approaches for this task involves processing a huge amount of data that, in many cases, have an imbalanced distribution of active and inactive samples. This is usually termed the class-imbalance problem and may have a significant negative effect on the performance of the learned models. The performance of feature selection (FS) for QSAR models is usually damaged by the class-imbalance nature of the involved datasets. This paper proposes the use of an FS method focused on dealing with the class-imbalance problems. The method is based on the use of FS ensembles constructed by boosting and using two well-known FS methods, fast clustering-based FS and the fast correlation-based filter. The experimental results demonstrate the efficiency of the proposal in terms of the classification performance compared to standard methods. The proposal can be extended to other FS methods and applied to other problems in cheminformatics.
Collapse
Affiliation(s)
| | | | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| | - Gonzalo Cerruela-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| |
Collapse
|
39
|
Semenova E, Williams DP, Afzal AM, Lazic SE. A Bayesian neural network for toxicity prediction. ACTA ACUST UNITED AC 2020. [DOI: 10.1016/j.comtox.2020.100133] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
40
|
Drug efficacy and toxicity prediction: an innovative application of transcriptomic data. Cell Biol Toxicol 2020; 36:591-602. [PMID: 32780246 PMCID: PMC7661398 DOI: 10.1007/s10565-020-09552-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 08/03/2020] [Indexed: 02/07/2023]
Abstract
Drug toxicity and efficacy are difficult to predict partly because they are both poorly defined, which I aim to remedy here from a transcriptomic perspective. There are two major categories of drugs: (1) restorative drugs aiming to restore an abnormal cell, tissue, or organ to normal function (e.g., restoring normal membrane function of epithelial cells in cystic fibrosis), and (2) disruptive drugs aiming to kill pathogens or malignant cells. These two types of drugs require different definition of efficacy and toxicity. I outlined rationales for defining transcriptomic efficacy and toxicity and illustrated numerically their application with two sets of transcriptomic data, one for restorative drugs (treating cystic fibrosis with lumacaftor/ivacaftor aiming to restore the cellular function of epithelial cells) and the other for disruptive drugs (treating acute myeloid leukemia with prexasertib). The conceptual framework presented will help and sensitize researchers to collect data required for determining drug toxicity.
Collapse
|
41
|
Sosnina EA, Sosnin S, Nikitina AA, Nazarov I, Osolodkin DI, Fedorov MV. Recommender Systems in Antiviral Drug Discovery. ACS OMEGA 2020; 5:15039-15051. [PMID: 32632398 PMCID: PMC7315437 DOI: 10.1021/acsomega.0c00857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 06/03/2020] [Indexed: 06/11/2023]
Abstract
Recommender systems (RSs), which underwent rapid development and had an enormous impact on e-commerce, have the potential to become useful tools for drug discovery. In this paper, we applied RS methods for the prediction of the antiviral activity class (active/inactive) for compounds extracted from ChEMBL. Two main RS approaches were applied: collaborative filtering (Surprise implementation) and content-based filtering (sparse-group inductive matrix completion (SGIMC) method). The effectiveness of RS approaches was investigated for prediction of antiviral activity classes ("interactions") for compounds and viruses, for which some of their interactions with other viruses or compounds are known, and for prediction of interaction profiles for new compounds. Both approaches achieved relatively good prediction quality for binary classification of individual interactions and compound profiles, as quantified by cross-validation and external validation receiver operating characteristic (ROC) score >0.9. Thus, even simple recommender systems may serve as an effective tool in antiviral drug discovery.
Collapse
Affiliation(s)
- Ekaterina A. Sosnina
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Institute
of Physiologically Active Compounds, RAS, Severniy pr. 1, Chernogolovka 142432, Russia
| | - Sergey Sosnin
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
| | - Anastasia A. Nikitina
- Department
of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1 bd. 3, Moscow 119991, Russia
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
| | - Ivan Nazarov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
| | - Dmitry I. Osolodkin
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
- Institute
of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Trubetskaya Ulitsa 8, Moscow 119991, Russia
| | - Maxim V. Fedorov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
- Physics
John Anderson Building, University of Strathclyde, 107 Rottenrow East, Glasgow G4 0NG, U.K.
| |
Collapse
|
42
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 312] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Affiliation(s)
- Günter Klambauer
- Johannes Kepler University , LIT AI Lab & Institute for Machine Learning , 4040 Linz , Austria
| | - Sepp Hochreiter
- Johannes Kepler University , LIT AI Lab & Institute for Machine Learning , 4040 Linz , Austria
| | - Matthias Rarey
- Universität Hamburg , ZBH-Center for Bioinformatics , 20146 Hamburg , Germany
| |
Collapse
|
44
|
Li X, Fourches D. Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT. J Cheminform 2020; 12:27. [PMID: 33430978 PMCID: PMC7178569 DOI: 10.1186/s13321-020-00430-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/15/2020] [Indexed: 12/25/2022] Open
Abstract
Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require large training sets to learn the endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with specific endpoints. Herein, the method is evaluated on four benchmark datasets (lipophilicity, FreeSolv, HIV, and blood–brain barrier penetration). The results showed the method can achieve strong performances for all four datasets compared to other state-of-the-art machine learning modeling techniques reported in the literature so far.![]()
Collapse
Affiliation(s)
- Xinhao Li
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA.
| |
Collapse
|
45
|
Hemmerich J, Ecker GF. In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020; 10:e1475. [PMID: 35866138 PMCID: PMC9286356 DOI: 10.1002/wcms.1475] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/09/2020] [Accepted: 03/10/2020] [Indexed: 12/18/2022]
Abstract
In silico toxicology is an emerging field. It gains increasing importance as research is aiming to decrease the use of animal experiments as suggested in the 3R principles by Russell and Burch. In silico toxicology is a means to identify hazards of compounds before synthesis, and thus in very early stages of drug development. For chemical industries, as well as regulatory agencies it can aid in gap‐filling and guide risk minimization strategies. Techniques such as structural alerts, read‐across, quantitative structure–activity relationship, machine learning, and deep learning allow to use in silico toxicology in many cases, some even when data is scarce. Especially the concept of adverse outcome pathways puts all techniques into a broader context and can elucidate predictions by mechanistic insights. This article is categorized under:Structure and Mechanism > Computational Biochemistry and Biophysics Data Science > Chemoinformatics
Collapse
Affiliation(s)
- Jennifer Hemmerich
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| |
Collapse
|
46
|
Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 2020; 12:17. [PMID: 33431004 PMCID: PMC7079452 DOI: 10.1186/s13321-020-00423-w] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/09/2020] [Indexed: 01/03/2023] Open
Abstract
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model’s result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
Collapse
Affiliation(s)
- Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany. .,BIGCHEM GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
| | - Guillaume Godin
- Firmenich International SA, Digital Lab, Geneva, Lausanne, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| |
Collapse
|
47
|
Karlov D, Sosnin S, Fedorov MV, Popov P. graphDelta: MPNN Scoring Function for the Affinity Prediction of Protein-Ligand Complexes. ACS OMEGA 2020; 5:5150-5159. [PMID: 32201802 PMCID: PMC7081425 DOI: 10.1021/acsomega.9b04162] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 02/21/2020] [Indexed: 06/04/2023]
Abstract
In this work, we present graph-convolutional neural networks for the prediction of binding constants of protein-ligand complexes. We derived the model using multi task learning, where the target variables are the dissociation constant (K d), inhibition constant (K i), and half maximal inhibitory concentration (IC50). Being rigorously trained on the PDBbind dataset, the model achieves the Pearson correlation coefficient of 0.87 and the RMSE value of 1.05 in pK units, outperforming recently developed 3D convolutional neural network model K deep.
Collapse
Affiliation(s)
- Dmitry
S. Karlov
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
| | - Sergey Sosnin
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
- Skolkovo
Innovation Center,Syntelly LLC, 42 Bolshoy Boulevard, Moscow 143026, Russia
| | - Maxim V. Fedorov
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
- Skolkovo
Innovation Center,Syntelly LLC, 42 Bolshoy Boulevard, Moscow 143026, Russia
- University
of Strathclyde, Physics
John Anderson Building, 107 Rottenrow East, Glasgow UK G4 0NG, U.K.
| | - Petr Popov
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
- Moscow
Institute of Physics and Technology, Dolgoprudny 141701, Russia
| |
Collapse
|
48
|
Xu T, Ngan DK, Ye L, Xia M, Xie HQ, Zhao B, Simeonov A, Huang R. Predictive Models for Human Organ Toxicity Based on In Vitro Bioactivity Data and Chemical Structure. Chem Res Toxicol 2020; 33:731-741. [PMID: 32077278 PMCID: PMC10926239 DOI: 10.1021/acs.chemrestox.9b00305] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Traditional toxicity testing reliant on animal models is costly and low throughput, posing a significant challenge with the increasing numbers of chemicals that humans are exposed to in the environment. The purpose of this investigation was to build optimal prediction models for various human in vivo/organ-level toxicity end points (extracted from ChemIDPlus) using chemical structure and Tox21 in vitro quantitative high-throughput screening (qHTS) bioactivity assay data. Several supervised machine learning algorithms were applied to model 14 human toxicity end points pertaining to vascular, kidney, ureter and bladder, and liver organ systems. Three metrics were used to evaluate model performance: area under the receiver operating characteristic curve (AUC-ROC), balanced accuracy (BA), and Matthews correlation coefficient (MCC). The top four models, with AUC-ROC values >0.8, were derived for endocrine (0.90 ± 0.00), musculoskeletal (0.88 ± 0.02), peripheral nerve and sensation (0.85 ± 0.01), and brain and coverings (0.83 ± 0.02) toxicities, whereas the best model AUC-ROC values were >0.7 for the remaining 10 toxicities. Model performance was found to be dependent on the specific data set, model type, and feature selection method used. In addition, chemical structure and assay data showed different levels of contribution to the prediction of different toxicity end points. Although in vitro assay data, when combined with chemical structure, slightly improved the predictive accuracy for most end points (11 out of 14), a noteworthy finding was the near equal success of the structure-only models, which do not require Tox21 qHTS screening data, and the relatively poor performance of assay-only models. Thus, the top-performing structure-only models from this study could be applied for hazard screening of large sets of chemicals for potential human toxicity, whereas the largest assay contributions to models (i.e., cellular targets) could be used, along with the top-contributing structural features, to provide insight into toxicity mechanisms.
Collapse
Affiliation(s)
- Tuan Xu
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD 20850, USA
| | - Deborah K. Ngan
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD 20850, USA
| | - Lin Ye
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD 20850, USA
| | - Menghang Xia
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD 20850, USA
| | - Heidi Q. Xie
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center of Eco-Environment Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bin Zhao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center of Eco-Environment Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Anton Simeonov
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD 20850, USA
| | - Ruili Huang
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD 20850, USA
| |
Collapse
|
49
|
Tetko IV, Tropsha A. Joint Virtual Special Issue on Computational Toxicology. J Chem Inf Model 2020; 60:1069-1071. [PMID: 32101004 DOI: 10.1021/acs.jcim.0c00140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum Munchen Deutsches Forschungszentrum fur Umwelt und Gesundheit, Munich 27599, Germany
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
50
|
Cheminformatics Explorations of Natural Products. PROGRESS IN THE CHEMISTRY OF ORGANIC NATURAL PRODUCTS 2019; 110:1-35. [PMID: 31621009 DOI: 10.1007/978-3-030-14632-0_1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The chemistry of natural products is fascinating and has continuously attracted the attention of the scientific community for many reasons including, but not limited to, biosynthesis pathways, chemical diversity, the source of bioactive compounds and their marked impact on drug discovery. There is a broad range of experimental and computational techniques (molecular modeling and cheminformatics) that have evolved over the years and have assisted the investigation of natural products. Herein, we discuss cheminformatics strategies to explore the chemistry and applications of natural products. Since the potential synergisms between cheminformatics and natural products are vast, we will focus on three major aspects: (1) exploration of the chemical space of natural products to identify bioactive compounds, with emphasis on drug discovery; (2) assessment of the toxicity profile of natural products; and (3) diversity analysis of natural product collections and the design of chemical collections inspired by natural sources.
Collapse
|