1
|
Seal S, Mahale M, García-Ortegón M, Joshi CK, Hosseini-Gerami L, Beatson A, Greenig M, Shekhar M, Patra A, Weis C, Mehrjou A, Badré A, Paisley B, Lowe R, Singh S, Shah F, Johannesson B, Williams D, Rouquie D, Clevert DA, Schwab P, Richmond N, Nicolaou CA, Gonzalez RJ, Naven R, Schramm C, Vidler LR, Mansouri K, Walters WP, Wilk DD, Spjuth O, Carpenter AE, Bender A. Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World. Chem Res Toxicol 2025. [PMID: 40314361 DOI: 10.1021/acs.chemrestox.5c00033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.
Collapse
Affiliation(s)
- Srijit Seal
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Manas Mahale
- Department of Pharmaceutical Chemistry, Bombay College of Pharmacy, Mumbai 400098, India
| | | | - Chaitanya K Joshi
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, U.K
| | | | - Alex Beatson
- Axiom Bio, San Francisco, California 94107, United States
| | - Matthew Greenig
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Mrinal Shekhar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | | | | | | | - Adrien Badré
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Brianna Paisley
- Eli Lilly & Company, Indianapolis, Indiana 46285, United States
| | | | - Shantanu Singh
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Falgun Shah
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | | | | | - David Rouquie
- Toxicology Data Science, Bayer SAS Crop Science Division, Valbonne Sophia-Antipolis 06560, France
| | - Djork-Arné Clevert
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin 10922, Germany
| | | | | | - Christos A Nicolaou
- Computational Drug Design, Digital Science & Innovation, Novo Nordisk US R&D, Lexington, Massachusetts 02421, United States
| | - Raymond J Gonzalez
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | - Russell Naven
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | | | - Kamel Mansouri
- NIH/NIEHS/DTT/NICEATM, Research Triangle Park, North Carolina 27709, United States
| | | | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala 751 24, Sweden
- Phenaros Pharmaceuticals AB, Uppsala 75239, Sweden
| | - Anne E Carpenter
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
- College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi 127788, United Arab Emirates
| |
Collapse
|
2
|
Burgoon LD, Kluxen FM, Hüser A, Frericks M. The database makes the poison: How the selection of datasets in QSAR models impacts toxicant prediction of higher tier endpoints. Regul Toxicol Pharmacol 2024; 151:105663. [PMID: 38871173 DOI: 10.1016/j.yrtph.2024.105663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/10/2024] [Accepted: 06/10/2024] [Indexed: 06/15/2024]
Abstract
As the United States and the European Union continue their steady march towards the acceptance of new approach methodologies (NAMs), we need to ensure that the available tools are fit for purpose. Critics will be well-positioned to caution against NAMs acceptance and adoption if the tools turn out to be inadequate. In this paper, we focus on Quantitative Structure Activity-Relationship (QSAR) models and highlight how the training database affects quality and performance of these models. Our analysis goes to the point of asking, "are the endpoints extracted from the experimental studies in the database trustworthy, or are they false negatives/positives themselves?" We also discuss the impacts of chemistry on QSAR models, including issues with 2-D structure analyses when dealing with isomers, metabolism, and toxicokinetics. We close our analysis with a discussion of challenges associated with translational toxicology, specifically the lack of adverse outcome pathways/adverse outcome pathway networks (AOPs/AOPNs) for many higher tier endpoints. We recognize that it takes a collaborate effort to build better and higher quality QSAR models especially for higher tier toxicological endpoints. Hence, it is critical to bring toxicologists, statisticians, and machine learning specialists together to discuss and solve these challenges to get relevant predictions.
Collapse
|
3
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
4
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
5
|
Lewis-Atwell T, Beechey D, Şimşek Ö, Grayson MN. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS Catal 2023; 13:13506-13515. [PMID: 37881791 PMCID: PMC10594582 DOI: 10.1021/acscatal.3c02513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/24/2023] [Indexed: 10/27/2023]
Abstract
Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and SN2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol-1 of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
Collapse
Affiliation(s)
- Toby Lewis-Atwell
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Daniel Beechey
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Özgür Şimşek
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Matthew N. Grayson
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| |
Collapse
|
6
|
Bassani D, Brigo A, Andrews-Morger A. Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon. Chem Res Toxicol 2023; 36:1503-1517. [PMID: 37584277 PMCID: PMC10523574 DOI: 10.1021/acs.chemrestox.3c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 08/17/2023]
Abstract
In silico approaches have acquired a towering role in pharmaceutical research and development, allowing laboratories all around the world to design, create, and optimize novel molecular entities with unprecedented efficiency. From a toxicological perspective, computational methods have guided the choices of medicinal chemists toward compounds displaying improved safety profiles. Even if the recent advances in the field are significant, many challenges remain active in the on-target and off-target prediction fields. Machine learning methods have shown their ability to identify molecules with safety concerns. However, they strongly depend on the abundance and diversity of data used for their training. Sharing such information among pharmaceutical companies remains extremely limited due to confidentiality reasons, but in this scenario, a recent concept named "federated learning" can help overcome such concerns. Within this framework, it is possible for companies to contribute to the training of common machine learning algorithms, using, but not sharing, their proprietary data. Very recently, Lhasa Limited organized a hackathon involving several industrial partners in order to assess the performance of their federated learning platform, called "Effiris". In this paper, we share our experience as Roche in participating in such an event, evaluating the performance of the federated algorithms and comparing them with those coming from our in-house-only machine learning models. Our aim is to highlight the advantages of federated learning and its intrinsic limitations and also suggest some points for potential improvements in the method.
Collapse
Affiliation(s)
- Davide Bassani
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | - Alessandro Brigo
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | - Andrea Andrews-Morger
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| |
Collapse
|
7
|
Redshaw J, Ting DSJ, Brown A, Hirst JD, Gärtner T. Krein support vector machine classification of antimicrobial peptides. DIGITAL DISCOVERY 2023; 2:502-511. [PMID: 37065679 PMCID: PMC10087059 DOI: 10.1039/d3dd00004d] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 02/22/2023] [Indexed: 03/02/2023]
Abstract
Antimicrobial peptides (AMPs) represent a potential solution to the growing problem of antimicrobial resistance, yet their identification through wet-lab experiments is a costly and time-consuming process. Accurate computational predictions would allow rapid in silico screening of candidate AMPs, thereby accelerating the discovery process. Kernel methods are a class of machine learning algorithms that utilise a kernel function to transform input data into a new representation. When appropriately normalised, the kernel function can be regarded as a notion of similarity between instances. However, many expressive notions of similarity are not valid kernel functions, meaning they cannot be used with standard kernel methods such as the support-vector machine (SVM). The Kreĭn-SVM represents generalisation of the standard SVM that admits a much larger class of similarity functions. In this study, we propose and develop Kreĭn-SVM models for AMP classification and prediction by employing the Levenshtein distance and local alignment score as sequence similarity functions. Utilising two datasets from the literature, each containing more than 3000 peptides, we train models to predict general antimicrobial activity. Our best models achieve an AUC of 0.967 and 0.863 on the test sets of each respective dataset, outperforming the in-house and literature baselines in both cases. We also curate a dataset of experimentally validated peptides, measured against Staphylococcus aureus and Pseudomonas aeruginosa, in order to evaluate the applicability of our methodology in predicting microbe-specific activity. In this case, our best models achieve an AUC of 0.982 and 0.891, respectively. Models to predict both general and microbe-specific activities are made available as web applications.
Collapse
Affiliation(s)
- Joseph Redshaw
- School of Chemistry, University of Nottingham, University Park Nottingham NG7 2RD UK
| | - Darren S J Ting
- Academic Ophthalmology, School of Medicine, University of Nottingham Nottingham NG7 2UH UK
- Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, University of Birmingham Birmingham UK
- Birmingham and Midland Eye Centre Birmingham UK
| | - Alex Brown
- Artificial Intelligence and Machine Learning, GSK Medicines Research Centre Gunnels Wood Road Stevenage SG1 2NY UK
| | - Jonathan D Hirst
- School of Chemistry, University of Nottingham, University Park Nottingham NG7 2RD UK
| | - Thomas Gärtner
- Machine Learning Group, TU Wien Informatics Vienna Austria
| |
Collapse
|
8
|
Karade D, Karade V. AIDrugApp: artificial intelligence-based Web-App for virtual screening of inhibitors against SARS-COV-2. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2058619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Divya Karade
- Chemical Engineering and Process Development (Cepd) Division, CSIR-National Chemical Laboratory, Pune India
| | - Vikas Karade
- Department of Research and Development, Algosurg Products Pvt. Ltd, Mumbai, India
| |
Collapse
|
9
|
Computational Prediction of Compound-Protein Interactions for Orphan Targets Using CGBVS. Molecules 2021; 26:molecules26175131. [PMID: 34500569 PMCID: PMC8434178 DOI: 10.3390/molecules26175131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 08/16/2021] [Accepted: 08/23/2021] [Indexed: 12/02/2022] Open
Abstract
A variety of Artificial Intelligence (AI)-based (Machine Learning) techniques have been developed with regard to in silico prediction of Compound–Protein interactions (CPI)—one of which is a technique we refer to as chemical genomics-based virtual screening (CGBVS). Prediction calculations done via pairwise kernel-based support vector machine (SVM) is the main feature of CGBVS which gives high prediction accuracy, with simple implementation and easy handling. We studied whether the CGBVS technique can identify ligands for targets without ligand information (orphan targets) using data from G protein-coupled receptor (GPCR) families. As the validation method, we tested whether the ligand prediction was correct for a virtual orphan GPCR in which all ligand information for one selected target was omitted from the training data. We have specifically expressed the results of this study as applicability index and developed a method to determine whether CGBVS can be used to predict GPCR ligands. Validation results showed that the prediction accuracy of each GPCR differed greatly, but models using Multiple Sequence Alignment (MSA) as the protein descriptor performed well in terms of overall prediction accuracy. We also discovered that the effect of the type compound descriptors on the prediction accuracy was less significant than that of the type of protein descriptors used. Furthermore, we found that the accuracy of the ligand prediction depends on the amount of ligand information with regard to GPCRs related to the target. Additionally, the prediction accuracy tends to be high if a large amount of ligand information for related proteins is used in the training.
Collapse
|
10
|
Rajput A, Kumar M. Anti-Ebola: an initiative to predict Ebola virus inhibitors through machine learning. Mol Divers 2021; 26:1635-1644. [PMID: 34357513 PMCID: PMC8343361 DOI: 10.1007/s11030-021-10291-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/28/2021] [Indexed: 01/17/2023]
Abstract
Ebola virus is a deadly pathogen responsible for a frequent series of outbreaks since 1976. Despite various efforts from researchers worldwide, its mortality and fatality are quite high. For antiviral drug discovery, the computational efforts are considered highly useful. Therefore, we have developed an 'anti-Ebola' web server, through quantitative structure-activity relationship information of available molecules with experimental anti-Ebola activities. Three hundred and five unique anti-Ebola compounds with their respective IC50 values were extracted from the 'DrugRepV' database. Later, the compounds were used to extract the molecular descriptors, which were subjected to regression-based model development. The robust machine learning techniques, namely support vector machine, random forest and artificial neural network, were employed using tenfold cross-validation. After a randomization approach, the best predictive model showed Pearson's correlation coefficient ranges from 0.83 to 0.98 on training/testing (T274) dataset. The robustness of the developed models was cross-evaluated using William's plot. The highly robust computational models are integrated into the web server. The 'anti-Ebola' web server is freely available at https://bioinfo.imtech.res.in/manojk/antiebola . We anticipate this will serve the scientific community for developing effective inhibitors against the Ebola virus.
Collapse
Affiliation(s)
- Akanksha Rajput
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh, 160036, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh, 160036, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
11
|
Berenger F, Yamanishi Y. Ranking Molecules with Vanishing Kernels and a Single Parameter: Active Applicability Domain Included. J Chem Inf Model 2020; 60:4376-4387. [PMID: 32281797 DOI: 10.1021/acs.jcim.9b01075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In ligand-based virtual screening, high-throughput screening (HTS) data sets can be exploited to train classification models. Such models can be used to prioritize yet untested molecules, from the most likely active (against a protein target of interest) to the least likely active. In this study, a single-parameter ranking method with an Applicability Domain (AD) is proposed. In effect, Kernel Density Estimates (KDE) are revisited to improve their computational efficiency and incorporate an AD. Two modifications are proposed: (i) using vanishing kernels (i.e., kernel functions with a finite support) and (ii) using the Tanimoto distance between molecular fingerprints as a radial basis function. This construction is termed "Vanishing Ranking Kernels" (VRK). Using VRK on 21 HTS assays, it is shown that VRK can compete in performance with a graph convolutional deep neural network. VRK are conceptually simple and fast to train. During training, they require optimizing a single parameter. A trained VRK model usually defines an active AD. Exploiting this AD can significantly increase the screening frequency of a VRK model. Software: https://github.com/UnixJunkie/rankers. Data sets: https://zenodo.org/record/1320776 and https://zenodo.org/record/3540423.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, 680-4 Iizuka, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, 680-4 Iizuka, Japan
| |
Collapse
|
12
|
Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, Varnek A. Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions. Int J Mol Sci 2020; 21:E5542. [PMID: 32756326 PMCID: PMC7432167 DOI: 10.3390/ijms21155542] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 07/27/2020] [Accepted: 07/30/2020] [Indexed: 01/28/2023] Open
Abstract
Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.
Collapse
Affiliation(s)
- Assima Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur I. Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Ramil I. Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur R. Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
| | - Igor I. Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
- Faculty of Physics, Moscow State University, 119234 Moscow, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| |
Collapse
|
13
|
Brown N, Ertl P, Lewis R, Luksch T, Reker D, Schneider N. Artificial intelligence in chemistry and drug design. J Comput Aided Mol Des 2020; 34:709-715. [PMID: 32468207 DOI: 10.1007/s10822-020-00317-x] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Nathan Brown
- BenevolentAI, 4-8 Maple Street, London, W1T 5HD, UK
| | - Peter Ertl
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland
| | - Richard Lewis
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland.
| | | | - Daniel Reker
- Koch Institute for Integrative Cancer Research and MIT-IBM Watson AI Lab, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
- Division of Gastroenterology, Hepatology and Endoscopy, Department of Medicine, Harvard Medical School, Brigham and Women's Hospital, Boston, MA,, 02115, USA.
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland
| |
Collapse
|
14
|
Pérez-Sianes J, Pérez-Sánchez H, Díaz F. Virtual Screening Meets Deep Learning. Curr Comput Aided Drug Des 2019; 15:6-28. [PMID: 30338743 DOI: 10.2174/1573409914666181018141602] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 10/08/2018] [Accepted: 10/11/2018] [Indexed: 12/27/2022]
Abstract
BACKGROUND Automated compound testing is currently the de facto standard method for drug screening, but it has not brought the great increase in the number of new drugs that was expected. Computer- aided compounds search, known as Virtual Screening, has shown the benefits to this field as a complement or even alternative to the robotic drug discovery. There are different methods and approaches to address this problem and most of them are often included in one of the main screening strategies. Machine learning, however, has established itself as a virtual screening methodology in its own right and it may grow in popularity with the new trends on artificial intelligence. OBJECTIVE This paper will attempt to provide a comprehensive and structured review that collects the most important proposals made so far in this area of research. Particular attention is given to some recent developments carried out in the machine learning field: the deep learning approach, which is pointed out as a future key player in the virtual screening landscape.
Collapse
Affiliation(s)
| | - Horacio Pérez-Sánchez
- Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, Universidad Católica San Antonio de Murcia (UCAM), Murcia, Spain
| | - Fernando Díaz
- Departamento de Informática, Escuela de Ingeniería Informática, University of Valladolid, Segovia, Spain
| |
Collapse
|
15
|
Ensemble Machine Learning and Applicability Domain Estimation for Fluorescence Properties and its Application to Structural Design. JOURNAL OF COMPUTER AIDED CHEMISTRY 2019. [DOI: 10.2751/jcac.20.7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
Miyao T, Funatsu K, Bajorath J. Exploring Alternative Strategies for the Identification of Potent Compounds Using Support Vector Machine and Regression Modeling. J Chem Inf Model 2018; 59:983-992. [DOI: 10.1021/acs.jcim.8b00584] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tomoyuki Miyao
- Data Science Center and Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Kimito Funatsu
- Data Science Center and Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
17
|
Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM. Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 2018; 14:23-33. [PMID: 30488731 DOI: 10.1080/17460441.2019.1549033] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
INTRODUCTION Novel drug discovery remains an enormous challenge, with various computer-aided drug design (CADD) approaches having been widely employed for this purpose. CADD, specifically the commonly used support vector machines (SVMs), can employ machine learning techniques. SVMs and their variations offer numerous drug discovery applications, which range from the classification of substances (as active or inactive) to the construction of regression models and the ranking/virtual screening of databased compounds. Areas covered: Herein, the authors consider some of the applications of SVMs in medicinal chemistry, illustrating their main advantages and disadvantages, as well as trends in their utilization, via the available published literature. The aim of this review is to provide an up-to-date review of the recent applications of SVMs in drug discovery as described by the literature, thereby highlighting their strengths, weaknesses, and future challenges. Expert opinion: Techniques based on SVMs are considered as powerful approaches in early drug discovery. The ability of SVMs to classify active or inactive compounds has enabled the prioritization of substances for virtual screening. Indeed, one of the main advantages of SVMs is related to their potential in the analysis of nonlinear problems. However, despite successes in employing SVMs, the challenges of improving accuracy remain.
Collapse
Affiliation(s)
- Vinicius Gonçalves Maltarollo
- a Departamento de Produtos Farmacêuticos, Faculdade de Farmácia , Universidade Federal de Minas Gerais , Belo Horizonte , Brazil
| | - Thales Kronenberger
- b Department of Internal Medicine VIII , University Hospital of Tübingen , Tübingen , Germany
| | - Gabriel Zarzana Espinoza
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Patricia Rufino Oliveira
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Kathia Maria Honorio
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil.,d Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , Santo André , Brazil
| |
Collapse
|
18
|
Kleandrova VV, Luan F, Speck-Planche A, Cordeiro MNDS. QSAR-Based Studies of Nanomaterials in the Environment. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Nanotechnology is a newly emerging field, posing substantial impacts on society, economy, and the environment. In recent years, the development of nanotechnology has led to the design and large-scale production of many new materials and devices with a vast range of applications. However, along with the benefits, the use of nanomaterials raises many questions and generates concerns due to the possible health-risks and environmental impacts. This chapter provides an overview of the Quantitative Structure-Activity Relationships (QSAR) studies performed so far towards predicting nanoparticles' environmental toxicity. Recent progresses on the application of these modeling studies are additionally pointed out. Special emphasis is given to the setup of a QSAR perturbation-based model for the assessment of ecotoxic effects of nanoparticles in diverse conditions. Finally, ongoing challenges that may lead to new and exciting directions for QSAR modeling are discussed.
Collapse
Affiliation(s)
| | - Feng Luan
- Yantai University, China & University of Porto, Portugal
| | | | | |
Collapse
|
19
|
Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain - A case study on predicting ER and AR binding. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:303-316. [PMID: 27088868 DOI: 10.1080/1062936x.2016.1172665] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A fundamental element when deriving a robust and predictive in silico model is not only the statistical quality of the model in question but, equally important, the estimate of its predictive boundaries. This work presents a new method, conformal prediction, for applicability domain estimation in the field of endocrine disruptors. The method is applied to binders and non-binders related to the oestrogen and androgen receptors. Ensembles of decision trees are used as statistical method and three different sets (dragon, rdkit and signature fingerprints) are investigated as chemical descriptors. The conformal prediction method results in valid models where there is an excellent balance in quality between the internally validated training set and the corresponding external test set, both in terms of validity and with respect to sensitivity and specificity. With this method the level of confidence can be readily altered by the user and the consequences thereof immediately inspected. Furthermore, the predictive boundaries for the derived models are rigorously defined by using the conformal prediction framework, thus no ambiguity exists as to the level of similarity needed for new compounds to be in or out of the predictive boundaries of the derived models where reliable predictions can be expected.
Collapse
Affiliation(s)
- U Norinder
- a Swedish Toxicology Sciences Research Center , Södertälje , Sweden
- b Department of Computer and Systems Sciences , Stockholm University , Kista , Sweden
| | - A Rybacka
- c Department of Chemistry , Umeå University , Umeå , Sweden
| | - P L Andersson
- c Department of Chemistry , Umeå University , Umeå , Sweden
| |
Collapse
|
20
|
Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inform 2016; 35:160-80. [PMID: 27492083 DOI: 10.1002/minf.201501019] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 01/20/2016] [Indexed: 11/08/2022]
Abstract
Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the "normal" objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built-in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.
Collapse
Affiliation(s)
- Miriam Mathea
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Waldemar Klingspohn
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany.
| |
Collapse
|
21
|
Miyao T, Kaneko H, Funatsu K. Inverse QSPR/QSAR Analysis for Chemical Structure Generation (from y to x). J Chem Inf Model 2016; 56:286-99. [PMID: 26818135 DOI: 10.1021/acs.jcim.5b00628] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Retrieving descriptor information (x information) from a value of an objective variable (y) is a fundamental problem in inverse quantitative structure-property relationship (inverse-QSPR) analysis but challenging because of the complexity of the preimage function. Herewith, we propose using a cluster-wise multiple linear regression (cMLR) model as a QSPR model for inverse-QSPR analysis. x information is acquired as a probability density function by combining cMLR and the prior distribution modeled with a mixture of Gaussians (GMMs). Three case studies were conducted to demonstrate various aspects of the potential of cMLR. It was found that the predictive power of cMLR was superior to that of MLR, especially for data with nonlinearity. Moreover, it turned out that the applicability domain could be considered since the posterior distribution inherits the prior distribution's feature (i.e., training data feature) and represents the possibility of having the desired property. Finally, a series of inverse analyses with the GMMs/cMLR was demonstrated with the aim to generate de novo structures having specific aqueous solubility.
Collapse
Affiliation(s)
- Tomoyuki Miyao
- Department of Chemical System Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Hiromasa Kaneko
- Department of Chemical System Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
22
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
23
|
Liu R, Jiang W, Walkey CD, Chan WCW, Cohen Y. Prediction of nanoparticles-cell association based on corona proteins and physicochemical properties. NANOSCALE 2015; 7:9664-75. [PMID: 25959034 DOI: 10.1039/c5nr01537e] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Cellular association of nanoparticles (NPs) in biological fluids is affected by proteins adsorbed onto the NP surface, forming a "protein corona", thereby impacting cellular bioactivity. Here we investigate, based on an extensive gold NPs protein corona dataset, the relationships between NP-cell association and protein corona fingerprints (PCFs) as well as NP physicochemical properties. Accordingly, quantitative structure-activity relationships (QSARs) were developed based on both linear and non-linear support vector regression (SVR) models making use of a sequential forward floating selection of descriptors. The SVR model with only 6 serum proteins and zeta potential had higher accuracy (R(2) = 0.895) relative to the linear model (R(2) = 0.850) with 11 PCFs. Considering the initial pool of 148 descriptors, the APOB, A1AT, ANT3, and PLMN serum proteins along with NP zeta potential were identified as most significant to correlating NP-cell association. The present study suggests that QSARs exploration of NP-cell association data, considering the role of both NP protein corona and physicochemical properties, can support the planning and interpretation of toxicity studies and guide the design of NPs for biomedical applications.
Collapse
Affiliation(s)
- Rong Liu
- Center for Environmental Implications of Nanotechnology, University of California, Los Angeles, CA90095, USA.
| | | | | | | | | |
Collapse
|
24
|
Yan J, Zhu WW, Kong B, Lu HB, Yun YH, Huang JH, Liang YZ. A Combinational Strategy of Model Disturbance and Outlier Comparison to Define Applicability Domain in Quantitative Structural Activity Relationship. Mol Inform 2014; 33:503-13. [PMID: 27486037 DOI: 10.1002/minf.201300161] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 04/16/2014] [Indexed: 01/21/2023]
Abstract
In order to define an applicability domain for quantitative structure-activity relationship modeling, a combinational strategy of model disturbance and outlier comparison is developed. An indicator named model disturbance index was defined to estimate the prediction error. Moreover, the information of the outliers in the training set was used to filter the unreliable samples in the test set based on "structural similarity". Chromatography retention indices data were used to investigate this approach. The relationship between model disturbance index and prediction error can be found. Also, the comparison between the outlier set and the test set could provide additional information about which unknown samples should be paid more attentions. A novel technique based on model population analysis was used to evaluate the validity of applicability domain. Finally, three commonly used methods, i.e. Leverage, descriptor range-based and model perturbation method, were compared with the proposed approach.
Collapse
Affiliation(s)
- Jun Yan
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Wei-Wei Zhu
- Department of Chemical and Bioscience, HeChi University, YiZhou 546300, P. R. China
| | - Bo Kong
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Hong-Bing Lu
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Yong-Huan Yun
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Jian-Hua Huang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831.
| |
Collapse
|
25
|
|
26
|
He Y, Chong FHT, Lim J, Lee RJT, Yap CW. Determination of the Potential of Drug Candidates to Cause Severe Skin Disorders Using Computational Modeling. Mol Inform 2013; 32:303-12. [PMID: 27481525 DOI: 10.1002/minf.201200086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 02/20/2013] [Indexed: 11/11/2022]
Abstract
Efficient and accurate prediction for drugs' potential to cause rare and severe adverse drug reactions (ADRs) is needed to facilitate the evaluation of risk-benefit ratio of drug candidates during drug development. Severe skin disorders like the Stevens Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN), which are life-threatening dermatological conditions, are such ADRs that have not received sufficient attention so far. In this study, a total of 1127 marketed drugs were screened for their potential to cause SJS/TEN, of which 255 were found to cause SJS/TEN and 239 were unlikely to cause SJS/TEN. One-class classification method was used to develop multiple prediction models. An applicability domain was determined to define the applicability of the model. Ensemble method was used to develop ensemble models to improve prediction ability. The final ensemble model achieved a sensitivity and specificity of 81 % and 67.4 %, respectively, when estimated using the external 5-fold cross validation method, and a sensitivity of 66.7 % when assessed using an external positive set. The results suggest the methods used in this study are potentially useful for facilitating the prediction of rare and severe ADRs.
Collapse
Affiliation(s)
- Yuye He
- Pharmaceutical Data Exploration Laboratory, Department of Pharmacy, National University of Singapore, Singapore tel: 065-65165971, fax: 065-67791554
| | | | | | | | - Chun Wei Yap
- Pharmaceutical Data Exploration Laboratory, Department of Pharmacy, National University of Singapore, Singapore tel: 065-65165971, fax: 065-67791554.
| |
Collapse
|
27
|
Seal A, Yogeeswari P, Sriram D, Wild DJ. Enhanced ranking of PknB Inhibitors using data fusion methods. J Cheminform 2013; 5:2. [PMID: 23317154 PMCID: PMC3600029 DOI: 10.1186/1758-2946-5-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2012] [Accepted: 12/11/2012] [Indexed: 11/18/2022] Open
Abstract
Background Mycobacterium tuberculosis encodes 11 putative serine-threonine proteins Kinases (STPK) which regulates transcription, cell development and interaction with the host cells. From the 11 STPKs three kinases namely PknA, PknB and PknG have been related to the mycobacterial growth. From previous studies it has been observed that PknB is essential for mycobacterial growth and expressed during log phase of the growth and phosphorylates substrates involved in peptidoglycan biosynthesis. In recent years many high affinity inhibitors are reported for PknB. Previously implementation of data fusion has shown effective enrichment of active compounds in both structure and ligand based approaches .In this study we have used three types of data fusion ranking algorithms on the PknB dataset namely, sum rank, sum score and reciprocal rank. We have identified reciprocal rank algorithm is capable enough to select compounds earlier in a virtual screening process. We have also screened the Asinex database with reciprocal rank algorithm to identify possible inhibitors for PknB. Results In our work we have used both structure-based and ligand-based approaches for virtual screening, and have combined their results using a variety of data fusion methods. We found that data fusion increases the chance of actives being ranked highly. Specifically, we found that the ranking of Pharmacophore search, ROCS and Glide XP fused with a reciprocal ranking algorithm not only outperforms structure and ligand based approaches but also capable of ranking actives better than the other two data fusion methods using the BEDROC, robust initial enhancement (RIE) and AUC metrics. These fused results were used to identify 45 candidate compounds for further experimental validation. Conclusion We show that very different structure and ligand based methods for predicting drug-target interactions can be combined effectively using data fusion, outperforming any single method in ranking of actives. Such fused results show promise for a coherent selection of candidates for biological screening.
Collapse
Affiliation(s)
- Abhik Seal
- Computer-Aided Drug Design Laboratory, Department of Pharmacy Birla Institute of Technology, Hyderabad Campus, Shameerpet, Hyderbad, 500078, India.
| | | | | | | | | |
Collapse
|
28
|
Karpov PV, Baskin II, Zhokhova NI, Nawrozkij MB, Zefirov AN, Yablokov AS, Novakov IA, Zefirov NS. One-class approach: models for virtual screening of non-nucleoside HIV-1 reverse transcriptase inhibitors based on the concept of continuous molecular fields. Russ Chem Bull 2012. [DOI: 10.1007/s11172-011-0372-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
29
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 152] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
30
|
Karpov PV, Baskin II, Zhokhova NI, Zefirov NS. Method of continuous molecular fields in the one-class classification task. DOKLADY CHEMISTRY 2011. [DOI: 10.1134/s0012500811100016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
31
|
Soto AJ, Vazquez GE, Strickert M, Ponzoni I. Target-Driven Subspace Mapping Methods and Their Applicability Domain Estimation. Mol Inform 2011; 30:779-89. [DOI: 10.1002/minf.201100053] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 05/26/2011] [Indexed: 11/06/2022]
|
32
|
Varnek A, Baskin II. Chemoinformatics as a Theoretical Chemistry Discipline. Mol Inform 2011; 30:20-32. [PMID: 27467875 DOI: 10.1002/minf.201000100] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 01/14/2011] [Indexed: 01/29/2023]
Abstract
Here, chemoinformatics is considered as a theoretical chemistry discipline complementary to quantum chemistry and force-field molecular modeling. These three fields are compared with respect to molecular representation, inference mechanisms, basic concepts and application areas. A chemical space, a fundamental concept of chemoinformatics, is considered with respect to complex relations between chemical objects (graphs or descriptor vectors). Statistical Learning Theory, one of the main mathematical approaches in structure-property modeling, is briefly reviewed. Links between chemoinformatics and its "sister" fields - machine learning, chemometrics and bioinformatics are discussed.
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | - Igor I Baskin
- Department of Chemistry, Moscow State University, Moscow 119991, Russia
| |
Collapse
|
33
|
Hinselmann G, Rosenbaum L, Jahn A, Fechner N, Zell A. jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints. J Cheminform 2011; 3:3. [PMID: 21219648 PMCID: PMC3033338 DOI: 10.1186/1758-2946-3-3] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2010] [Accepted: 01/10/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. RESULTS We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. CONCLUSIONS jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining.
Collapse
Affiliation(s)
- Georg Hinselmann
- University of Tübingen, Center for Bioinformatics Tübingen (ZBIT), Sand 1, 72076 Tübingen, Germany.
| | | | | | | | | |
Collapse
|
34
|
Sushko I, Novotarskyi S, Körner R, Pandey AK, Cherkasov A, Li J, Gramatica P, Hansen K, Schroeter T, Müller KR, Xi L, Liu H, Yao X, Öberg T, Hormozdiari F, Dao P, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz’min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set. J Chem Inf Model 2010; 50:2094-111. [DOI: 10.1021/ci100253r] [Citation(s) in RCA: 172] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Iurii Sushko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Sergii Novotarskyi
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Robert Körner
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Anil Kumar Pandey
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Artem Cherkasov
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Jiazhong Li
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Paola Gramatica
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Katja Hansen
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Timon Schroeter
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Klaus-Robert Müller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Lili Xi
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Huanxiang Liu
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Xiaojun Yao
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Tomas Öberg
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Farhad Hormozdiari
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Phuong Dao
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Cenk Sahinalp
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Roberto Todeschini
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Pavel Polishchuk
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Anatoliy Artemenko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Victor Kuz’min
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Todd M. Martin
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Douglas M. Young
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Denis Fourches
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Eugene Muratov
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Alexander Tropsha
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Igor Baskin
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Dragos Horvath
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Gilles Marcou
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Christophe Muller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Alexander Varnek
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Volodymyr V. Prokopenko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Igor V. Tetko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| |
Collapse
|
35
|
Baskin II, Kireeva N, Varnek A. The One-Class Classification Approach to Data Description and to Models Applicability Domain. Mol Inform 2010; 29:581-7. [DOI: 10.1002/minf.201000063] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2010] [Accepted: 07/11/2010] [Indexed: 11/06/2022]
|