1
|
Hamakawa Y, Miyao T. Understanding Conformation Importance in Data-Driven Property Prediction Models. J Chem Inf Model 2025; 65:3388-3404. [PMID: 40099781 PMCID: PMC12004525 DOI: 10.1021/acs.jcim.5c00018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2025] [Revised: 02/26/2025] [Accepted: 02/28/2025] [Indexed: 03/20/2025]
Abstract
The prediction of molecular properties is essential in chemoinformatics and has many applications in drug discovery and materials design. Molecular representations play a key role in the prediction models to achieve high prediction accuracy. Nevertheless, appropriate molecular descriptors, including the utilization of conformational information, have been unclear due to a lack of systematic analysis of property prediction models and control. This study investigates the influence of using multiple conformers in machine learning-based property prediction, comparing two- and three-dimensional descriptors using three independent data sets: a large-scale quantum mechanical property, a medium-scale melting point, and small-scale enantioselective chemical reaction data sets. One unique aspect of this study is creating these carefully controlled data sets for models' performance evaluation in conformational diversity and the target property's dependence on conformation. Our findings show that using all available conformers as simple data augmentation consistently achieves high prediction accuracy among aggregation approaches, followed by mean aggregation. Furthermore, Uni-Mol, an end-to-end prediction model utilizing atomic coordinates and elements, combined with the ground-truth conformation, significantly outperformed traditional 2D and 3D descriptors and predicted conformational-sensitive properties with high accuracy. Although the prediction accuracy of the Uni-Mol model significantly decreased using the wrong conformers, it still outperformed two-dimensional extended connectivity fingerprints, which showed higher prediction accuracy than most of the tested 3D descriptors.
Collapse
Affiliation(s)
- Yu Hamakawa
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
2
|
Rosa LS, Argolo CO, Nascimento CM, Pimentel AS. Identifying Substructures That Facilitate Compounds to Penetrate the Blood-Brain Barrier via Passive Transport Using Machine Learning Explainer Models. ACS Chem Neurosci 2024; 15:2144-2159. [PMID: 38723285 PMCID: PMC11157485 DOI: 10.1021/acschemneuro.3c00840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 06/06/2024] Open
Abstract
The local interpretable model-agnostic explanation (LIME) method was used to interpret two machine learning models of compounds penetrating the blood-brain barrier. The classification models, Random Forest, ExtraTrees, and Deep Residual Network, were trained and validated using the blood-brain barrier penetration dataset, which shows the penetrability of compounds in the blood-brain barrier. LIME was able to create explanations for such penetrability, highlighting the most important substructures of molecules that affect drug penetration in the barrier. The simple and intuitive outputs prove the applicability of this explainable model to interpreting the permeability of compounds across the blood-brain barrier in terms of molecular features. LIME explanations were filtered with a weight equal to or greater than 0.1 to obtain only the most relevant explanations. The results showed several structures that are important for blood-brain barrier penetration. In general, it was found that some compounds with nitrogenous substructures are more likely to permeate the blood-brain barrier. The application of these structural explanations may help the pharmaceutical industry and potential drug synthesis research groups to synthesize active molecules more rationally.
Collapse
Affiliation(s)
- Lucca
Caiaffa Santos Rosa
- Departamento de Química, Pontifícia Universidade Católica do
Rio de Janeiro, Rio de
Janeiro, RJ 22453-900, Brazil
| | - Caio Oliveira Argolo
- Departamento de Química, Pontifícia Universidade Católica do
Rio de Janeiro, Rio de
Janeiro, RJ 22453-900, Brazil
| | | | - Andre Silva Pimentel
- Departamento de Química, Pontifícia Universidade Católica do
Rio de Janeiro, Rio de
Janeiro, RJ 22453-900, Brazil
| |
Collapse
|
3
|
Agoni C, Stavropoulos I, Kirwan A, Mysior MM, Holton T, Kranjc T, Simpson JC, Roche HM, Shields DC. Cell-Penetrating Milk-Derived Peptides with a Non-Inflammatory Profile. Molecules 2023; 28:6999. [PMID: 37836842 PMCID: PMC10574647 DOI: 10.3390/molecules28196999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 09/24/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023] Open
Abstract
Milk-derived peptides are known to confer anti-inflammatory effects. We hypothesised that milk-derived cell-penetrating peptides might modulate inflammation in useful ways. Using computational techniques, we identified and synthesised peptides from the milk protein Alpha-S1-casein that were predicted to be cell-penetrating using a machine learning predictor. We modified the interpretation of the prediction results to consider the effects of histidine. Peptides were then selected for testing to determine their cell penetrability and anti-inflammatory effects using HeLa cells and J774.2 mouse macrophage cell lines. The selected peptides all showed cell penetrating behaviour, as judged using confocal microscopy of fluorescently labelled peptides. None of the peptides had an effect on either the NF-κB transcription factor or TNFα and IL-1β secretion. Thus, the identified milk-derived sequences have the ability to be internalised into the cell without affecting cell homeostatic mechanisms such as NF-κB activation. These peptides are worthy of further investigation for other potential bioactivities or as a naturally derived carrier to promote the cellular internalisation of other active peptides.
Collapse
Affiliation(s)
- Clement Agoni
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Medicine, University College Dublin, Belfield, D04 W6F6 Dublin 4, Ireland
- Discipline of Pharmaceutical Sciences, University of KwaZulu Natal, Durban 4041, South Africa
| | - Ilias Stavropoulos
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Medicine, University College Dublin, Belfield, D04 W6F6 Dublin 4, Ireland
| | - Anna Kirwan
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 N2E5 Dublin 4, Ireland
| | - Margharitha M. Mysior
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute of Food and Health, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland
| | - Therese Holton
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute of Food and Health, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland
| | - Tilen Kranjc
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute of Food and Health, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland
| | - Jeremy C. Simpson
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 N2E5 Dublin 4, Ireland
| | - Helen M. Roche
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute for Global Food Security, Queens University Belfast, Belfast BT9 5DL, UK
| | - Denis C. Shields
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Medicine, University College Dublin, Belfield, D04 W6F6 Dublin 4, Ireland
| |
Collapse
|
4
|
Nittinger E, Clark A, Gaulton A, Zdrazil B. Biomedical data analyses facilitated by open cheminformatics workflows. J Cheminform 2023; 15:46. [PMID: 37069670 PMCID: PMC10108476 DOI: 10.1186/s13321-023-00718-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023] Open
Affiliation(s)
- Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Alex Clark
- Research Informatics, Collaborative Drug Discovery, Inc., Ottawa, Canada
| | | | - Barbara Zdrazil
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK.
| |
Collapse
|
5
|
López-López E, Fernández-de Gortari E, Medina-Franco JL. Yes SIR! On the structure-inactivity relationships in drug discovery. Drug Discov Today 2022; 27:2353-2362. [PMID: 35561964 DOI: 10.1016/j.drudis.2022.05.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/09/2022] [Accepted: 05/05/2022] [Indexed: 12/12/2022]
Abstract
In analogy with structure-activity relationships (SARs), which are at the core of medicinal chemistry, studying structure-inactivity relationships (SIRs) is essential to understanding and predicting biological activity. Current computational methods should predict or distinguish 'activity' and 'inactivity' with the same confidence because both concepts are complementary. However, the lack of inactivity data, in particular in the public domain, limits the development of predictive models and its broad application. In this review, we encourage the scientific community to disclose and analyze high-confidence activity data considering both the labeled 'active' and 'inactive' compounds.
Collapse
Affiliation(s)
- Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico; Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico.
| | - Eli Fernández-de Gortari
- Department of Nanosafety, International Iberian Nanotechnology Laboratory, Braga 4715-330, Portugal
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| |
Collapse
|
6
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
7
|
Yan J, Yan X, Hu S, Zhu H, Yan B. Comprehensive Interrogation on Acetylcholinesterase Inhibition by Ionic Liquids Using Machine Learning and Molecular Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:14720-14731. [PMID: 34636548 DOI: 10.1021/acs.est.1c02960] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Quantitative structure-activity relationship (QSAR) modeling can be used to predict the toxicity of ionic liquids (ILs), but most QSAR models have been constructed by arbitrarily selecting one machine learning method and ignored the overall interactions between ILs and biological systems, such as proteins. In order to obtain more reliable and interpretable QSAR models and reveal the related molecular mechanism, we performed a systematic analysis of acetylcholinesterase (AChE) inhibition by 153 ILs using machine learning and molecular modeling. Our results showed that more reliable and stable QSAR models (R2 > 0.85 for both cross-validation and external validation) were obtained by combining the results from multiple machine learning approaches. In addition, molecular docking results revealed that the cations and organic anions of ILs bound to specific amino acid residues of AChE through noncovalent interactions such as π interactions and hydrogen bonds. The calculation results of binding free energy showed that an electrostatic interaction (ΔEele < -285 kJ/mol) was the main driving force for the binding of ILs to AChE. The overall findings from this investigation demonstrate that a systematic approach is much more convincing. Future research in this direction will help design the next generation of biosafe ILs.
Collapse
Affiliation(s)
- Jiachen Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, People's Republic of China
| | - Xiliang Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, People's Republic of China
| | - Song Hu
- School of Environmental Science and Engineering, Shandong University, Qingdao 266237, People's Republic of China
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Bing Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, People's Republic of China
- School of Environmental Science and Engineering, Shandong University, Qingdao 266237, People's Republic of China
| |
Collapse
|
8
|
Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space. Sci Rep 2021; 11:7628. [PMID: 33828175 PMCID: PMC8027643 DOI: 10.1038/s41598-021-87134-w] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 03/24/2021] [Indexed: 02/01/2023] Open
Abstract
Cell-penetrating peptides (CPPs) are naturally able to cross the lipid bilayer membrane that protects cells. These peptides share common structural and physicochemical properties and show different pharmaceutical applications, among which drug delivery is the most important. Due to their ability to cross the membranes by pulling high-molecular-weight polar molecules, they are termed Trojan horses. In this study, we proposed a machine learning (ML)-based framework named BChemRF-CPPred (beyond chemical rules-based framework for CPP prediction) that uses an artificial neural network, a support vector machine, and a Gaussian process classifier to differentiate CPPs from non-CPPs, using structure- and sequence-based descriptors extracted from PDB and FASTA formats. The performance of our algorithm was evaluated by tenfold cross-validation and compared with those of previously reported prediction tools using an independent dataset. The BChemRF-CPPred satisfactorily identified CPP-like structures using natural and synthetic modified peptide libraries and also obtained better performance than those of previously reported ML-based algorithms, reaching the independent test accuracy of 90.66% (AUC = 0.9365) for PDB, and an accuracy of 86.5% (AUC = 0.9216) for FASTA input. Moreover, our analyses of the CPP chemical space demonstrated that these peptides break some molecular rules related to the prediction of permeability of therapeutic molecules in cell membranes. This is the first comprehensive analysis to predict synthetic and natural CPP structures and to evaluate their chemical space using an ML-based framework. Our algorithm is freely available for academic use at http://comptools.linc.ufpa.br/BChemRF-CPPred .
Collapse
|
9
|
Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions. J Comput Aided Mol Des 2021; 35:285-295. [PMID: 33598870 PMCID: PMC7982389 DOI: 10.1007/s10822-021-00376-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 02/03/2021] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) enables modeling of quantitative structure–activity relationships (QSAR) and compound potency predictions. Recently, multi-target QSAR models have been gaining increasing attention. Simultaneous compound potency predictions for multiple targets can be carried out using ensembles of independently derived target-based QSAR models or in a more integrated and advanced manner using multi-target deep neural networks (MT-DNNs). Herein, single-target and multi-target ML models were systematically compared on a large scale in compound potency value predictions for 270 human targets. By design, this large-magnitude evaluation has been a special feature of our study. To these ends, MT-DNN, single-target DNN (ST-DNN), support vector regression (SVR), and random forest regression (RFR) models were implemented. Different test systems were defined to benchmark these ML methods under conditions of varying complexity. Source compounds were divided into training and test sets in a compound- or analog series-based manner taking target information into account. Data partitioning approaches used for model training and evaluation were shown to influence the relative performance of ML methods, especially for the most challenging compound data sets. For example, the performance of MT-DNNs with per-target models yielded superior performance compared to single-target models. For a test compound or its analogs, the availability of potency measurements for multiple targets affected model performance, revealing the influence of ML synergies.
Collapse
|
10
|
Sakai M, Nagayasu K, Shibui N, Andoh C, Takayama K, Shirakawa H, Kaneko S. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci Rep 2021; 11:525. [PMID: 33436854 PMCID: PMC7803991 DOI: 10.1038/s41598-020-80113-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 12/17/2020] [Indexed: 01/29/2023] Open
Abstract
Many therapeutic drugs are compounds that can be represented by simple chemical structures, which contain important determinants of affinity at the site of action. Recently, graph convolutional neural network (GCN) models have exhibited excellent results in classifying the activity of such compounds. For models that make quantitative predictions of activity, more complex information has been utilized, such as the three-dimensional structures of compounds and the amino acid sequences of their respective target proteins. As another approach, we hypothesized that if sufficient experimental data were available and there were enough nodes in hidden layers, a simple compound representation would quantitatively predict activity with satisfactory accuracy. In this study, we report that GCN models constructed solely from the two-dimensional structural information of compounds demonstrated a high degree of activity predictability against 127 diverse targets from the ChEMBL database. Using the information entropy as a metric, we also show that the structural diversity had less effect on the prediction performance. Finally, we report that virtual screening using the constructed model identified a new serotonin transporter inhibitor with activity comparable to that of a marketed drug in vitro and exhibited antidepressant effects in behavioural studies.
Collapse
Affiliation(s)
- Miyuki Sakai
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan ,Medical Database Ltd., 2-5-5 Sumitomoshibadaimon building, Shibadaimon, Minato-ku, Tokyo, 105-0012 Japan
| | - Kazuki Nagayasu
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Norihiro Shibui
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Chihiro Andoh
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Kaito Takayama
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Hisashi Shirakawa
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Shuji Kaneko
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| |
Collapse
|
11
|
Sato A, Miyao T, Jasial S, Funatsu K. Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations. J Comput Aided Mol Des 2021; 35:179-193. [PMID: 33392949 DOI: 10.1007/s10822-020-00361-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 11/12/2020] [Indexed: 11/27/2022]
Abstract
Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models predict biological activity and molecular property based on the numerical relationship between chemical structures and activity (property) values. Molecular representations are of importance in QSAR/QSPR analysis. Topological information of molecular structures is usually utilized (2D representations) for this purpose. However, conformational information seems important because molecules are in the three-dimensional space. As a three-dimensional molecular representation applicable to diverse compounds, similarity between a test molecule and a set of reference molecules has been previously proposed. This 3D representation was found to be effective on virtual screening for early enrichment of active compounds. In this study, we introduced the 3D representation into QSAR/QSPR modeling (regression tasks). Furthermore, we investigated relative merits of 3D representations over 2D in terms of the diversity of training data sets. For the prediction task of quantum mechanics-based properties, the 3D representations were superior to 2D. For predicting activity of small molecules against specific biological targets, no consistent trend was observed in the difference of performance using the two types of representations, irrespective of the diversity of training data sets.
Collapse
Affiliation(s)
- Akinori Sato
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Swarit Jasial
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Kimito Funatsu
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan.
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo. Bunkyo-ku, Tokyo, 113-8656, Japan.
| |
Collapse
|
12
|
Rognan D. Modeling Protein-Ligand Interactions: Are We Ready for Deep Learning? SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11521-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
13
|
Brown J. Practical Chemogenomic Modeling and Molecule Discovery Strategies Unveiled by Active Learning. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11533-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
14
|
Jethava KP, Fine J, Chen Y, Hossain A, Chopra G. Accelerated Reactivity Mechanism and Interpretable Machine Learning Model of N-Sulfonylimines toward Fast Multicomponent Reactions. Org Lett 2020; 22:8480-8486. [PMID: 33074678 DOI: 10.1021/acs.orglett.0c03083] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We introduce chemical reactivity flowcharts to help chemists interpret reaction outcomes using statistically robust machine learning models trained on a small number of reactions. We developed fast N-sulfonylimine multicomponent reactions for understanding reactivity and to generate training data. Accelerated reactivity mechanisms were investigated using density functional theory. Intuitive chemical features learned by the model accurately predicted heterogeneous reactivity of N-sulfonylimine with different carboxylic acids. Validation of the predictions shows that reaction outcome interpretation is useful for human chemists.
Collapse
|
15
|
Identifying representative kinases for inhibitor evaluation via systematic analysis of compound-based target relationships. Eur J Med Chem 2020; 204:112641. [DOI: 10.1016/j.ejmech.2020.112641] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 07/01/2020] [Accepted: 07/02/2020] [Indexed: 02/07/2023]
|
16
|
Yan X, Sedykh A, Wang W, Yan B, Zhu H. Construction of a web-based nanomaterial database by big data curation and modeling friendly nanostructure annotations. Nat Commun 2020; 11:2519. [PMID: 32433469 PMCID: PMC7239871 DOI: 10.1038/s41467-020-16413-3] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/22/2020] [Indexed: 12/27/2022] Open
Abstract
Modern nanotechnology research has generated numerous experimental data for various nanomaterials. However, the few nanomaterial databases available are not suitable for modeling studies due to the way they are curated. Here, we report the construction of a large nanomaterial database containing annotated nanostructures suited for modeling research. The database, which is publicly available through http://www.pubvinas.com/, contains 705 unique nanomaterials covering 11 material types. Each nanomaterial has up to six physicochemical properties and/or bioactivities, resulting in more than ten endpoints in the database. All the nanostructures are annotated and transformed into protein data bank files, which are downloadable by researchers worldwide. Furthermore, the nanostructure annotation procedure generates 2142 nanodescriptors for all nanomaterials for machine learning purposes, which are also available through the portal. This database provides a public resource for data-driven nanoinformatics modeling research aimed at rational nanomaterial design and other areas of modern computational nanotechnology.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou, 510006, China.,The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Alexander Sedykh
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA.,Sciome, Research Triangle Park, North Carolina, 27709, USA
| | - Wenyi Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Bing Yan
- Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou, 510006, China. .,School of Environmental Science and Engineering, Shandong University, Jinan, 250100, China.
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA. .,Department of Chemistry, Rutgers University, Camden, NJ, 08102, USA.
| |
Collapse
|
17
|
Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem 2019; 63:8761-8777. [PMID: 31512867 DOI: 10.1021/acs.jmedchem.9b01101] [Citation(s) in RCA: 177] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riß, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
18
|
Baltatu OC, Senar S, Campos LA, Cipolla-Neto J. Cardioprotective Melatonin: Translating from Proof-of-Concept Studies to Therapeutic Use. Int J Mol Sci 2019; 20:ijms20184342. [PMID: 31491852 PMCID: PMC6770816 DOI: 10.3390/ijms20184342] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/29/2019] [Accepted: 09/04/2019] [Indexed: 12/30/2022] Open
Abstract
In this review we summarized the actual clinical data for a cardioprotective therapeutic role of melatonin, listed melatonin and its agonists in different stages of development, and evaluated the melatonin cardiovascular target tractability and prediction using machine learning on ChEMBL. To date, most clinical trials investigating a cardioprotective therapeutic role of melatonin are in phase 2a. Selective melatonin receptor agonists Tasimelteon, Ramelteon, and combined melatonergic-serotonin Agomelatine, and other agonists with registered structures in CHEMBL were not yet investigated as cardioprotective or cardiovascular drugs. As drug-able for these therapeutic targets, melatonin receptor agonists have the benefit over melatonin of well-characterized pharmacologic profiles and extensive safety data. Recent reports of the X-ray crystal structures of MT1 and MT2 receptors shall lead to the development of highly selective melatonin receptor agonists. Predictive models using machine learning could help to identify cardiovascular targets for melatonin. Selecting ChEMBL scores > 4.5 in cardiovascular assays, and melatonin scores > 4, we obtained 284 records from 162 cardiovascular assays carried out with 80 molecules with predicted or measured melatonin activity. Melatonin activities (agonistic or antagonistic) found in these experimental cardiovascular assays and models include arrhythmias, coronary and large vessel contractility, and hypertension. Preclinical proof-of-concept and early clinical studies (phase 2a) suggest a cardioprotective benefit from melatonin in various heart diseases. However, larger phase 3 randomized interventional studies are necessary to establish melatonin and its agonists’ actions as cardioprotective therapeutic agents.
Collapse
Affiliation(s)
- Ovidiu Constantin Baltatu
- Center of Innovation, Technology and Education (CITE), School of Health Sciences at Anhembi Morumbi University, Laureate International Universities, Sao Jose dos Campos 12247-016, Brazil.
| | | | - Luciana Aparecida Campos
- Center of Innovation, Technology and Education (CITE), School of Health Sciences at Anhembi Morumbi University, Laureate International Universities, Sao Jose dos Campos 12247-016, Brazil.
| | - José Cipolla-Neto
- Department of Physiology and Biophysics, Institute of Biomedical Sciences, University of São Paulo, São Paulo 05508-900, Brazil.
| |
Collapse
|
19
|
Liu Z, Singh SB, Zheng Y, Lindblom P, Tice C, Dong C, Zhuang L, Zhao Y, Kruk BA, Lala D, Claremon DA, McGeehan GM, Gregg RD, Cain R. Discovery of Potent Inhibitors of 11β-Hydroxysteroid Dehydrogenase Type 1 Using a Novel Growth-Based Protocol of in Silico Screening and Optimization in CONTOUR. J Chem Inf Model 2019; 59:3422-3436. [PMID: 31355641 DOI: 10.1021/acs.jcim.9b00198] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Zhijie Liu
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Suresh B. Singh
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Yajun Zheng
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Peter Lindblom
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Colin Tice
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Chengguo Dong
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Linghang Zhuang
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Yi Zhao
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Barbara A. Kruk
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Deepak Lala
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - David A. Claremon
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Gerard M. McGeehan
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Richard D. Gregg
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Robert Cain
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
| |
Collapse
|
20
|
Algar WR, Jeen T, Massey M, Peveler WJ, Asselin J. Small Surface, Big Effects, and Big Challenges: Toward Understanding Enzymatic Activity at the Inorganic Nanoparticle-Substrate Interface. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2019; 35:7067-7091. [PMID: 30415548 DOI: 10.1021/acs.langmuir.8b02733] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Enzymes are important biomarkers for molecular diagnostics and targets for the action of drugs. In turn, inorganic nanoparticles (NPs) are of interest as materials for biological assays, biosensors, cellular and in vivo imaging probes, and vectors for drug delivery and theranostics. So how does an enzyme interact with a NP, and what are the outcomes of multivalent conjugation of its substrate to a NP? This invited feature article addresses the current state of the art in answering this question. Using gold nanoparticles (Au NPs) and semiconductor quantum dots (QDs) as illustrative materials, we discuss aspects of enzyme structure-function and the properties of NP interfaces and surface chemistry that determine enzyme-NP interactions. These aspects render the substrate-on-NP configurations far more complex and heterogeneous than the conventional turnover of discrete substrate molecules in bulk solution. Special attention is also given to the limitations of a standard kinetic analysis of the enzymatic turnover of these configurations, the need for a well-defined model of turnover, and whether a "hopping" model can account for behaviors such as the apparent acceleration of enzyme activity. A detailed and predictive understanding of how enzymes turn over multivalent NP-substrate conjugates will require a convergence of many concepts and tools from biochemistry, materials, and interface science. In turn, this understanding will help to enable rational, optimized, and value-added designs of NP bioconjugates for biomedical and clinical applications.
Collapse
Affiliation(s)
- W Russ Algar
- Department of Chemistry , University of British Columbia , 2036 Main Mall , Vancouver , British Columbia V6T 1Z1 , Canada
| | - Tiffany Jeen
- Department of Chemistry , University of British Columbia , 2036 Main Mall , Vancouver , British Columbia V6T 1Z1 , Canada
| | - Melissa Massey
- Department of Chemistry , University of British Columbia , 2036 Main Mall , Vancouver , British Columbia V6T 1Z1 , Canada
| | - William J Peveler
- Department of Chemistry , University of British Columbia , 2036 Main Mall , Vancouver , British Columbia V6T 1Z1 , Canada
- Division of Biomedical Engineering, School of Engineering , University of Glasgow , Glasgow G12 8LT , United Kingdom
| | - Jérémie Asselin
- Department of Chemistry , University of British Columbia , 2036 Main Mall , Vancouver , British Columbia V6T 1Z1 , Canada
| |
Collapse
|
21
|
Janosch A, Kaffka C, Bickle M. Unbiased Phenotype Detection Using Negative Controls. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2019; 24:234-241. [PMID: 30616488 PMCID: PMC6484531 DOI: 10.1177/2472555218818053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 11/13/2018] [Accepted: 11/19/2018] [Indexed: 01/22/2023]
Abstract
Phenotypic screens using automated microscopy allow comprehensive measurement of the effects of compounds on cells due to the number of markers that can be scored and the richness of the parameters that can be extracted. The high dimensionality of the data is both a rich source of information and a source of noise that might hide information. Many methods have been proposed to deal with this complex data in order to reduce the complexity and identify interesting phenotypes. Nevertheless, the majority of laboratories still only use one or two parameters in their analysis, likely due to the computational challenges of carrying out a more sophisticated analysis. Here, we present a novel method that allows discovering new, previously unknown phenotypes based on negative controls only. The method is compared with L1-norm regularization, a standard method to obtain a sparse matrix. The analytical pipeline is implemented in the open-source software KNIME, allowing the implementation of the method in many laboratories, even ones without advanced computing knowledge.
Collapse
Affiliation(s)
- Antje Janosch
- Technology Development Studio, Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Carolin Kaffka
- Fraunhofer-Institut für Verkehrs- und Infrastruktursysteme, Dresden, Germany
| | - Marc Bickle
- Technology Development Studio, Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| |
Collapse
|
22
|
Réau M, Lagarde N, Zagury JF, Montes M. Nuclear Receptors Database Including Negative Data (NR-DBIND): A Database Dedicated to Nuclear Receptors Binding Data Including Negative Data and Pharmacological Profile. J Med Chem 2018; 62:2894-2904. [PMID: 30354114 DOI: 10.1021/acs.jmedchem.8b01105] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Nuclear receptors (NRs) are transcription factors that regulate gene expression in various physiological processes through their interactions with small hydrophobic molecules. They constitute an important class of targets for drugs and endocrine disruptors and are widely studied for both health and environment concerns. Since the integration of negative data can be critical for accurate modeling of ligand activity profiles, we manually collected and annotated NRs interaction data (positive and negative) through a sharp review of the corresponding literature. 15 116 positive and negative interactions data are provided for 28 NRs together with 593 PDB structures in the freely available Nuclear Receptors Database Including Negative Data ( http://nr-dbind.drugdesign.fr ). The NR-DBIND contains the most extensive information about interaction data on NRs, which should bring valuable information to chemists, biologists, pharmacologists and toxicologists.
Collapse
Affiliation(s)
- Manon Réau
- Laboratoire GBA, EA4627 , Conservatoire National des Arts et Métiers , 2 Rue Conté , 75003 Paris , France
| | - Nathalie Lagarde
- Laboratoire GBA, EA4627 , Conservatoire National des Arts et Métiers , 2 Rue Conté , 75003 Paris , France.,Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques in Silico, INSERM UMR-S 973, 75205 Paris , France
| | - Jean-François Zagury
- Laboratoire GBA, EA4627 , Conservatoire National des Arts et Métiers , 2 Rue Conté , 75003 Paris , France
| | - Matthieu Montes
- Laboratoire GBA, EA4627 , Conservatoire National des Arts et Métiers , 2 Rue Conté , 75003 Paris , France
| |
Collapse
|
23
|
Rodríguez-Pérez R, Bajorath J. Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data. ACS OMEGA 2018; 3:12033-12040. [PMID: 30320286 PMCID: PMC6175492 DOI: 10.1021/acsomega.8b01682] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 09/12/2018] [Indexed: 05/28/2023]
Abstract
Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification, a standard method in machine learning, in predicting compound profiling experiments. Predictions were carried out on a large profiling matrix extracted from biological screening data. For model building, submatrices with varying data density of 5-100% were generated to investigate the influence of data sparseness on prediction performance. MT-DNN models were directly compared to RF models, and control calculations were also carried out using single-task DNNs (ST-DNNs). On the basis of compound recall, the performance of ST-DNN was consistently lower than that of the other methods. Compared to RF, MT-DNN models only yielded better prediction performance for individual assays in the profiling matrix when training data were very sparse. However, when the matrix density increased to at least 25-45%, per-assay RF models met or partly exceeded the prediction performance of MT-DNN models. When the average performances of RF and MT-DNN over the grid of all targets were compared, MT-DNN was slightly superior to RF, which was a likely consequence of multitask learning. Overall, there was no consistent advantage of MT-DNN over standard RF classification in predicting the results of compound profiling assays under varying conditions. In the presence of very sparse training data, prediction performance was limited. Under these challenging conditions, MT-DNN was the preferred approach. When more training data became available and prediction performance increased, RF performance was not inferior to MT-DNN.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
- Department
of Medicinal Chemistry, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397 Biberach/Riß, Germany
| | - Jürgen Bajorath
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
24
|
Computationally derived compound profiling matrices. Future Sci OA 2018; 4:FSO327. [PMID: 30271615 PMCID: PMC6153460 DOI: 10.4155/fsoa-2018-0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 06/11/2018] [Indexed: 11/17/2022] Open
Abstract
Aim: Screening of compounds against panels of targets yields profiling matrices. Such matrices are excellent test cases for the analysis and prediction of ligand–target interactions. We made three matrices freely available that were extracted from public screening data. Methodology: A new algorithm was used to derive complete profiling matrices from assay data. Data: Two profiling matrices were derived from confirmatory assays containing 53 different targets and 109,925 and 143,310 distinct compounds, respectively. A third matrix was extracted from primary screening assays covering 171 different targets and 224,251 compounds. Next steps: Profiling matrices can be used to test computational chemogenomics methods for their ability to predict ligand–target pairs. Additional matrices will be generated for individual target families. Screening of a given number of small molecules in different assays produces a so-called profiling matrix. This matrix reports for each compound inactivity or activity in all assays. Such profiling matrices are frequently produced in the pharmaceutical industry but rarely disclosed. We have recently reported a computational methodology to derive such matrices from independently conducted biological assays. Herein, we describe three large profiling matrices we have extracted from many experimental screens and made publicly available. These matrices should be helpful to investigators studying the interactions of small molecules with different biological targets.
Shown is a small compound profiling matrix resulting from assaying four compounds (rows) against four target proteins (columns). ‘+’ and ‘−’ signs denote compound activity and inactivity, respectively.
Collapse
|
25
|
Vogt M, Jasial S, Bajorath J. Extracting Compound Profiling Matrices from Screening Data. ACS OMEGA 2018; 3:4706-4712. [PMID: 30023898 PMCID: PMC6044819 DOI: 10.1021/acsomega.8b00461] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 04/20/2018] [Indexed: 05/11/2023]
Abstract
Compound profiling matrices record assay results for compound libraries tested against panels of targets. In addition to their relevance for exploring structure-activity relationships, such matrices are of considerable interest for chemoinformatic and chemogenomic applications. For example, profiling matrices provide a valuable data resource for the development and evaluation of machine learning approaches for multitask activity prediction. However, experimental compound profiling matrices are rare in the public domain. Although they are generated in pharmaceutical settings, they are typically not disclosed. Herein, we present an algorithm for the generation of large profiling matrices, for example, containing more than 100 000 compounds exhaustively tested against 50 to 100 targets. The new methodology is a variant of biclustering algorithms originally introduced for large-scale analysis of genomics data. Our approach is applied here to assays from the PubChem BioAssay database and generates profiling matrices of increasing assay or compound coverage by iterative removal of entities that limit coverage. Weight settings control final matrix size by preferentially retaining assays or compounds. In addition, the methodology can also be applied to generate matrices enriched with active entries representing above-average assay hit rates.
Collapse
|